Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Simplifying Data Engineering and Analytics with Delta
  • Table Of Contents Toc
  • Feedback & Rating feedback
Simplifying Data Engineering and Analytics with Delta

Simplifying Data Engineering and Analytics with Delta

By : Anindita Mahapatra
4.9 (15)
close
close
Simplifying Data Engineering and Analytics with Delta

Simplifying Data Engineering and Analytics with Delta

4.9 (15)
By: Anindita Mahapatra

Overview of this book

Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases. In this book, you’ll learn about the principles of distributed computing, data modeling techniques, and big data design patterns and templates that help solve end-to-end data flow problems for common scenarios and are reusable across use cases and industry verticals. You’ll also learn how to recover from errors and the best practices around handling structured, semi-structured, and unstructured data using Delta. After that, you’ll get to grips with features such as ACID transactions on big data, disciplined schema evolution, time travel to help rewind a dataset to a different time or version, and unified batch and streaming capabilities that will help you build agile and robust data products. By the end of this Delta book, you’ll be able to use Delta as the foundational block for creating analytics-ready data that fuels all AI/BI use cases.
Table of Contents (18 chapters)
close
close
1
Section 1 – Introduction to Delta Lake and Data Engineering Principles
5
Section 2 – End-to-End Process of Building Delta Pipelines
13
Section 3 – Operationalizing and Productionalizing Delta Pipelines

Understanding the role of data personas

Since data engineering is such a crucial field, you may be wondering who the main players are and what skill sets they possess. Building a data product involves several folks, all of whom need to come together with seamless handoffs to ensure a successful end product or service is created. It would be a mistake to create silos and increase both the number and complexity of integration points as each additional integration is a potential failure point. Data engineering has a fair overlap with software engineering and data science tasks:

Figure 1.3 – Data engineering requires multidisciplinary skill sets

Figure 1.3 – Data engineering requires multidisciplinary skill sets

All these roles require an understanding of data engineering:

  • Data engineers focus on maintaining how the data pipelines that ingest and transform data run. This has a lot in common with a software engineering role coupled with lots of data.
  • BI analysts focus on SQL-based reporting and can be operational or domain-specific subject-matter experts (SMEs) such as financial or supply chain analysts.
  • Data scientists and ML practitioners are statisticians who explore and analyze the data (via Exploratory Data Analysis (EDA)) and use modeling techniques at various levels of sophistication.
  • DevOps and MLOps focus on the infrastructure aspects of monitoring and automation. MLOps is DevOps coupled with the additional task of managing the life cycle of analytic models.
  • ML engineers refer to folks who can span across both the data engineer and data scientist roles.
  • Data leaders are chief data officers – that is, data stewards who are at the top of the food chain in terms of the ultimate governors of data.

The following diagram shows the typical placement of the four main data personas working collaboratively on a data platform to produce business insights to give the company a competitive advantage in the industry:

Figure 1.4 – Data personas working in collaboration

Figure 1.4 – Data personas working in collaboration

Let's take a look at a few of these points in more detail:

  1. DevOps is responsible for ensuring all operational aspects of the data platform and traditionally does a lot of scripting and automation.
  2. Data/ML engineers are responsible for building the data pipeline and taking care of the extract, transform, load (ETL) aspects of the pipeline.
  3. Data scientists of varying skill levels build models.
  4. Business analysts create reporting dashboards from aggregated curated data.
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY