Chapter 11: Operationalizing Data and ML Pipelines | Simplifying Data Engineering and Analytics with Delta

Book Overview & Buying
Table Of Contents
Feedback & Rating

Simplifying Data Engineering and Analytics with Delta

By : Anindita Mahapatra

4.9 (15)

Buy this Book

Simplifying Data Engineering and Analytics with Delta

4.9 (15)

By: Anindita Mahapatra

Buy this Book

Overview of this book

Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases. In this book, you’ll learn about the principles of distributed computing, data modeling techniques, and big data design patterns and templates that help solve end-to-end data flow problems for common scenarios and are reusable across use cases and industry verticals. You’ll also learn how to recover from errors and the best practices around handling structured, semi-structured, and unstructured data using Delta. After that, you’ll get to grips with features such as ACID transactions on big data, disciplined schema evolution, time travel to help rewind a dataset to a different time or version, and unified batch and streaming capabilities that will help you build agile and robust data products. By the end of this Delta book, you’ll be able to use Delta as the foundational block for creating analytics-ready data that fuels all AI/BI use cases.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1 – Introduction to Delta Lake and Data Engineering Principles

Free Chapter

Chapter 1: Introduction to Data Engineering

The motivation behind data engineering

Understanding the role of data personas

Big data ecosystem

Evolution of data systems

Distributed computing

Business justification for tech spending

Summary

Chapter 2: Data Modeling and ETL

Technical requirements

What is data modeling and why should you care?

Understanding metadata – data about data

Moving and transforming data using ETL

How to choose the right data format

Common big data design patterns

Summary

Further reading

Chapter 3: Delta – The Foundation Block for Big Data

Technical requirements

Motivation for Delta

Demystifying Delta

The main features of Delta

Life with and without Delta

Summary

Section 2 – End-to-End Process of Building Delta Pipelines

Chapter 4: Unifying Batch and Streaming with Delta

Technical requirements

Moving toward real-time systems

Streaming ETL

Handling streaming scenarios

Trade-offs in designing streaming architectures

Streaming best practices

Summary

Chapter 5: Data Consolidation in Delta Lake

Technical requirements

Why consolidate disparate data types?

Delta unifies all types of data

Avoiding patches of data darkness

Curating data in stages for analytics

Ease of extending to existing and new use cases

Data governance

Summary

Chapter 6: Solving Common Data Pattern Scenarios with Delta

Technical requirements

Understanding use case requirements

Minimizing data movement with Delta time travel

Delta cloning

Handling CDC

Handling Slowly Changing Dimensions (SCD)

Summary

Chapter 7: Delta for Data Warehouse Use Cases

Technical requirements

Choosing the right architecture

Understanding what a data warehouse really solves

Discovering when a data lake does not suffice

Addressing concurrency and latency requirements with Delta

Visualizing data using BI reporting

Analyzing tradeoffs in a push versus pull data flow

Considerations around data governance

The rise of the lakehouse category

Summary

Chapter 8: Handling Atypical Data Scenarios with Delta

Technical requirements

Emphasizing the importance of exploratory data analysis (EDA)

Applying sampling techniques to address class imbalance

Addressing data skew

Providing data anonymity

Handling bias and variance in data

Compensating for missing and out-of-range data

Monitoring data drift

Summary

Chapter 9: Delta for Reproducible Machine Learning Pipelines

Technical requirements

Data science versus machine learning

Challenges of ML development

Formalizing the ML development process

The role of Delta in an ML pipeline

From business problem to insight generation

Summary

Chapter 10: Delta for Data Products and Services

Technical requirements

DaaS

The need for data democratization

Delta for unstructured data

Data mashups using Delta

Facilitating data sharing with Delta

Summary

Section 3 – Operationalizing and Productionalizing Delta Pipelines

Chapter 11: Operationalizing Data and ML Pipelines

Technical requirements

Why operationalize?

Understanding and monitoring SLAs

Scaling and high availability

Planning for DR

Guaranteeing data quality

Automation of CI/CD pipelines

Data as code – An intelligent pipeline

Summary

Chapter 12: Optimizing Cost and Performance with Delta

Technical requirements

Improving performance with common strategies

Optimizing with Delta

Is cost always inversely proportional to performance?

Best practices for managing performance

Summary

Chapter 13: Managing Your Data Journey

Provisioning a multi-tenant infrastructure

Data democratization via policies and processes

Capacity planning

Managing and monitoring

Data sharing

Data migration

COE best practices

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

4.9 (15)

5 star

93.3%

4 star

6.7%

3 star

2 star

1 star

Simplifying Data Engineering and Analytics with Delta

By : Anindita Mahapatra

Simplifying Data Engineering and Analytics with Delta

By: Anindita Mahapatra

Overview of this book

Data as code – An intelligent pipeline

Unlock full access

Continue reading for free

Simplifying Data Engineering and Analytics with Delta

By : Anindita Mahapatra

Simplifying Data Engineering and Analytics with Delta

By: Anindita Mahapatra

Overview of this book

Data as code – An intelligent pipeline

Unlock full access

Continue reading for free

Create a Note

Delete Bookmark

Delete Note

Confirmation

Buy this book with your credits?