-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Simplifying Data Engineering and Analytics with Delta
By :

Streaming use cases comprise three main categories of real-time applications – decision engines and alerting apps; BI analytics and tools, such as SQL and search engines; and data science and ML use cases, as highlighted in the following diagram:
Figure 4.7 – Streaming use cases
In the next section, we will look at the three stages of ETL (Extract, Transform, Load) as it relates to streaming.
There are two types of stream processing – file-based and event-based. The former applies to data that has landed on disk, and the latter to data in flight, and which typically requires a streaming service such as Kafka, Kinesis, or EventHub from which spark.readStream
consumes the data. For example, a Kafka cluster consists of several brokers monitored by Zookeeper. Data is stored in topics that are broken down into one or more partitions that allow for scalability, fault...