-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Ingestion with Python Cookbook
By :

Like Parquet, Apache Avro is a widely used format to store analytical data. Apache Avro is a leading method of serialization to record data and relies on schemas. It also provides Remote Procedure Calls (RPCs), making transmitting data easier and resolving problems such as missing fields, extra fields, and naming fields.
In this recipe, we will understand how to read an Avro file properly and later comprehend how it works.
This recipe will require SparkSession
with some different configurations from the previous Ingesting Parquet files recipe. If you are already running SparkSession
, stop it using the following command:
spark.stop()
We will create another session in the How to do it… section.
The dataset used here can be found at this link: https://github.com/PacktPublishing/Data-Ingestion-with-Python-Cookbook/tree/main/Chapter_7/ingesting_avro_files.
Feel free to execute the code in a Jupyter notebook or your PySpark...
Change the font size
Change margin width
Change background colour