-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Ingestion with Python Cookbook
By :

Filtering data is a process of excluding or selecting only the necessary information to be used or stored. Even analytical data must be re-filtered to meet a specific need. An excellent example is data marts (we will cover them later in this recipe).
This recipe aims to understand how to create and apply filters to our data using a real-world example.
This recipe requires SparkSession
, so ensure yours is up and running. You can use the code provided at the beginning of the chapter or create your own.
The dataset used here will be the same as in the Ingesting Parquet files recipe.
To make this exercise more practical, let’s imagine we want to analyze two scenarios: how many trips each vendor made and what hour of the day there are more pickups. We will create some aggregations and filter our dataset to carry out those analyses.
Here are the steps to perform this recipe:
...Change the font size
Change margin width
Change background colour