Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Simplifying Data Engineering and Analytics with Delta
  • Table Of Contents Toc
  • Feedback & Rating feedback
Simplifying Data Engineering and Analytics with Delta

Simplifying Data Engineering and Analytics with Delta

By : Anindita Mahapatra
4.9 (15)
close
close
Simplifying Data Engineering and Analytics with Delta

Simplifying Data Engineering and Analytics with Delta

4.9 (15)
By: Anindita Mahapatra

Overview of this book

Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases. In this book, you’ll learn about the principles of distributed computing, data modeling techniques, and big data design patterns and templates that help solve end-to-end data flow problems for common scenarios and are reusable across use cases and industry verticals. You’ll also learn how to recover from errors and the best practices around handling structured, semi-structured, and unstructured data using Delta. After that, you’ll get to grips with features such as ACID transactions on big data, disciplined schema evolution, time travel to help rewind a dataset to a different time or version, and unified batch and streaming capabilities that will help you build agile and robust data products. By the end of this Delta book, you’ll be able to use Delta as the foundational block for creating analytics-ready data that fuels all AI/BI use cases.
Table of Contents (18 chapters)
close
close
1
Section 1 – Introduction to Delta Lake and Data Engineering Principles
5
Section 2 – End-to-End Process of Building Delta Pipelines
13
Section 3 – Operationalizing and Productionalizing Delta Pipelines

Business justification for tech spending

Tech enthusiasts with their love for bleeding-edge tools sometimes forget why they are building a data product. Research and exploration are important for innovation, but it needs to be disciplined and controlled. Not keeping the business counterparts in the loop results in miscommunication and misunderstandings regarding where the effort is going. Ego battles hinder project progress and result in wasted money, time, and people resources, which hurts the business. Tech should always add value and growth to a business rather than being viewed as a cost allocation. So, it is important to demonstrate the value of tech investment.

A joint business-technology strategy helps clarify the role of technology in driving business value to provide a transformation agenda. Key performance indicators (KPIs) and metrics including growth, return on investment (ROI), profitability, market share, earnings per share, margins, and revenue help quantify this investment.

The execution time of these projects is usually significant, so it is important to achieve the end goal in an agile manner in well-articulated baby steps. Some of the benefits may not be immediately realized, so it is important to balance infrastructure gains with productivity and capability gains and consider capital expenditure on initial infrastructure investment (CAPEX) versus ongoing operating expenses (OPEX) over a certain period. In addition, it is always good to do frequent risk assessments and have backup plans. Despite the best projections, costs can escalate to uncomfortable and unpredictable heights, so it is important to invest in a platform with tunable costs so that it can easily be monitored and adjusted when needed. Data is an asset and must be governed and protected from inappropriate access or breaches. Not only are such threats expensive, but they also damage the reputation of the organization:

Figure 1.20 – Mapping the impact of technology on business outcomes

Figure 1.20 – Mapping the impact of technology on business outcomes

Strategy for business transformation to use data as an asset

Data-driven organizations exhibit a culture of analytics. This cannot be confined to just a few premiere groups but rather to the entire organization. There are both cultural and technical challenges to overcome and this is where people, processes, and tools need to come together to bring around sustainable changes. Every business needs a strategy for business transformation. Here are some best practices for managing a big data initiative:

  • Understand the objectives and goals to come up with an overall enterprise strategy.
  • Assess the current state and document all the use cases and data sources.
  • Develop a roadmap that can be shared for collaborating and deciding on the main tools and frameworks to leverage organization-wide.
  • Design for data democratization to allow people to have access to data they have access to.
  • Establish rules around data governance so that workflows can be automated correctly without fear of data exfiltration.
  • Manage change as a continuous cycle of improvement. This means that there should be a center of excellence team that can serve as a hub and spoke model that interfaces with the individual lines of business. Adequate emphasis should be placed on training to engage and educate the team.

Big data trends and best practices

"The old order changeth, yielding place to new…"

(The Passing of Arthur, Alfred Lord Tennyson, 1809–1892)

We are living in an age of fast innovation and technology changes that are happening in the blink of an eye. We can learn from history and learn from the mistakes of those before us. However, we don't have the luxury to analyze everything around us and understand the top trends, though this will give us a better appreciation of the landscape and help us gravitate toward the right technology for our needs.

There is an increase in the adoption of cloud infrastructure because of the following points:

  • It provides affordable and scalable storage.
  • It's an elastic distributed compute infrastructure with pay-as-you-go flexibility.
  • It's a multi-cloud strategy and some on-premises presence to hedge risks.
  • It provides an increase in data consolidation to break down individual data silos in data lakes.
  • Other data stores such as data warehouses continue to live on, while newer ones such as lakehouses and data meshes are being introduced.
  • Unstructured data usage is on the rise.
  • Improved speed to insights.
  • Convergence of big data and ML.
  • Detecting and responding to pattern signals in real time as opposed to batch.
  • Analytics has moved from simple BI reporting to ML and AI as industries move from descriptive analytics to prescriptive and finally predictive.
  • Improved governance and security
  • Data discovery using business and operational enterprise-level meta stores.
  • Data governance to control who has access to what data.
  • Data lineage and data quality to determine how reliable the data is.

Let's summarize some of the best practices for building robust and reliable data platforms:

  • Build decoupled (storage and compute) systems because storage is far cheaper than compute. So, the ability to turn off compute when it's not in use will be a big cost saving. Having a microservices architecture will help manage such changes.
  • Leverage cloud storage, preferably in an open format.
  • Use the right tool for the right job.
  • Break down the data silos and create a single view of the data so that multiple use cases can leverage the same data with different tools.
  • Design data solutions with due consideration to use case-specific trade-offs such as latency, throughput, and access patterns.
  • Log design patterns where you maintain immutable logs for audit, compliance, and traceability requirements.
  • Expose multiple views of the data for consumers with different access privileges instead of copying the datasets multiple times to make slight changes to the data access requirements.
  • There will always be a point where a team will have to decide between whether they build or buy. Speed to insights should guide this decision, irrespective of how smart the team is or whether there is a window of opportunity, and you should not lose it in the pursuit of tech pleasures. The cost of building a solution to cater to an immediate need should be compared with the cost of a missed opportunity.

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY