-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Engineering with Databricks Cookbook
By :

Broadcast variables are a feature of Apache Spark that allows you to send large, read-only data to all the executors in a cluster efficiently. This can be useful when you have a large dataset that needs to be used for multiple tasks, but you don’t want to send the data over the network for each task. For example, if you have a lookup table that maps country codes to country names and you want to use it in a transformation on a large DataFrame, you can broadcast the lookup table to avoid sending it with every task.
In this recipe, you will learn how to create and use broadcast variables in Apache Spark using Python. You will also learn how broadcast variables work under the hood and what some of their benefits and limitations are.
delta
module and the SparkSession
class from the pyspark.sql...