Transforming Data in Real-Time
Learn how to perform data transformations on real-time event-driven data in Python by integrating distributed data pipelines with scalable, high-throughput and fault-tolerant streaming platforms.
This course provides a hands-on exploration of the industry-standard Apache Kafka distributed streaming platform and how it can be integrated with distributed data pipelines via Apache Spark and its Structured Streaming engine in order to build high-throughput and low-latency real-time data processing systems. This course follows on from our Distributed Data Engineering course, and enables experienced senior data engineers to build systems capable of transforming, and deriving actionable insight from, data in real-time, including performing real-time SQL operations, joins, deduplication and handling data with earlier timestamps but which arrive after data with later timestamps.