-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Engineering with Databricks Cookbook
By :

In this recipe, we will show you how to use Spark SQL to define transformations and filters on a streaming DataFrame that reads data from a Kafka topic. Applying transformations and filters on Streaming DataFrames enables the processing and manipulation of streaming data as it traverses the data pipeline. This functionality permits various operations, including data cleaning, enrichment, and reformatting, which are crucial for adapting the data to specific processing requirements. Real-world streaming data often arrive with errors, inconsistencies, or missing values. These operations help clean and sanitize the data by either eliminating problematic records or rectifying issues, ensuring the reliability and accuracy of subsequent analyses.
Before we start, we need to make sure that we have a Kafka cluster running and a topic that produces some streaming data. For simplicity, we will use a single-node...