-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Engineering with Databricks Cookbook
By :

In this recipe, we will learn how to configure checkpoints for stateful streaming queries in Apache Spark. Checkpoints are a mechanism to ensure the fault tolerance and reliability of streaming applications by saving the intermediate state of the query to a durable storage system. Checkpoints can also help recover from failures and resume the query from where it left off.
Before we start, we need to make sure that we have a Kafka cluster running and a topic that produces some streaming data. For simplicity, we will use a single-node Kafka cluster and a topic named users
. Open the 4.0 user-gen-kafka.ipynb
notebook and execute the cell. This notebook produces a user record every few seconds and puts it on a Kafka topic called users
.
Make sure you have run this notebook and that it is producing records as shown here:
Figure 4.9 – Output from user generation script
...