-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Engineering with Databricks Cookbook
By :

Delta Lake is an open source storage layer that enables building a lakehouse architecture with various compute engines and APIs. It provides features such as atomicity, consistency, isolation, and durability (ACID) transactions, scalable metadata, time travel, schema evolution, and data manipulation language (DML) operations. It is compatible with Apache Spark and other query engines.
This chapter provides a comprehensive overview of how to manage and optimize Delta tables using Apache Spark. It covers topics such as creating Delta tables, querying and analyzing them, optimizing them for better performance and cost-effectiveness, managing table metadata, migrating data to Delta Lake, and versioning Delta tables using time travel and table versioning. Additionally, this chapter explains how to perform incremental loads of data into Delta tables, including deduplication, writing change data, and reading change data feeds.
In this chapter, we’...