-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Ingestion with Python Cookbook
By :

As seen before, unstructured data or NoSQL is a group of information that does not follow a format, such as relational or tabular data. It can be presented as an image, video, metadata, transcripts, and so on. The data ingestion process usually involves a JSON file or a document collection, as we previously saw when ingesting data from MongoDB.
In this recipe, we will read a JSON file and transform it into a DataFrame without a schema. Although unstructured data is supposed to have a more flexible design, we will see some implications of not having any schema or structure in our DataFrame.
Here, we will use the holiday_brazil.json
file to create the DataFrame. You can find it in the GitHub repository here: https://github.com/PacktPublishing/Data-Ingestion-with-Python-Cookbook.
We will use SparkSession
to read the JSON file and create a DataFrame to ensure the session is up and running.
All code can be...
Change the font size
Change margin width
Change background colour