-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Essential PySpark for Scalable Data Analytics
By :

In Chapter 3, Data Cleansing and Integration, we established data lakes as the scalable and relatively inexpensive choice for the long-term storage of historical data. Some challenges with reliability and cloud-based data lakes were presented, and you learned how Delta Lake has been designed to overcome these challenges. The benefits of Delta Lake as an abstraction layer on top of cloud-based data lakes extend beyond just data engineering workloads to data science workloads as well, and we will explore those benefits in this section.
Delta Lake makes for an ideal candidate for an offline feature store on cloud-based data lakes because of the data reliability features and the novel time travel features that Delta Lake has to offer. We will discuss these in the following sections.
Delta Lake supports structured data with well-defined data types for columns. This makes Delta tables strongly typed...