-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Optimizing Databricks Workloads
By :

Azure Databricks is capable of processing batch and real-time big data workloads using Apache Spark™. As data engineers, it is important to master these workloads for building real-world use cases. A batch load generally refers to an ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) process where large chunks of data get copied from a source to a sink. This type of workload can take time to process, ranging from minutes to hours, whereas real-time processing works with a much smaller latency (that is, seconds or even milliseconds).
When it comes to Databricks, there are different ways to process batch and real-time workloads. In this chapter, we will discuss the approaches to build and run these workloads. The topics covered in this chapter are as follows: