-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Ingestion with Python Cookbook
By :

In the previous chapter, we saw how to apply schemas to structured and unstructured data, but the application of a schema is not limited to raw files.
Even when working with already processed data, there will be cases when we need to cast the values of a column or change column names to be used by another department. In this recipe, we will learn how to apply a schema to Parquet files and how it works.
We will need SparkSession
for this recipe. Ensure you have a session that is up and running. We will use the same dataset as in the Ingesting Parquet files recipe.
Feel free to execute the code using a Jupyter notebook or your PySpark shell session.
Here are the steps to perform this recipe:
VendorID: long tpep_pickup_datetime...
Change the font size
Change margin width
Change background colour