Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying The Definitive Guide to Data Integration
  • Table Of Contents Toc
  • Feedback & Rating feedback
The Definitive Guide to Data Integration

The Definitive Guide to Data Integration

By : BONNEFOY, CHAIZE, Raphaël MANSUY, Mehdi TAZI
close
close
The Definitive Guide to Data Integration

The Definitive Guide to Data Integration

By: BONNEFOY, CHAIZE, Raphaël MANSUY, Mehdi TAZI

Overview of this book

The Definitive Guide to Data Integration is an indispensable resource for navigating the complexities of modern data integration. Focusing on the latest tools, techniques, and best practices, this guide helps you master data integration and unleash the full potential of your data. This comprehensive guide begins by examining the challenges and key concepts of data integration, such as managing huge volumes of data and dealing with the different data types. You’ll gain a deep understanding of the modern data stack and its architecture, as well as the pivotal role of open-source technologies in shaping the data landscape. Delving into the layers of the modern data stack, you’ll cover data sources, types, storage, integration techniques, transformation, and processing. The book also offers insights into data exposition and APIs, ingestion and storage strategies, data preparation and analysis, workflow management, monitoring, data quality, and governance. Packed with practical use cases, real-world examples, and a glimpse into the future of data integration, The Definitive Guide to Data Integration is an essential resource for data eclectics. By the end of this book, you’ll have the gained the knowledge and skills needed to optimize your data usage and excel in the ever-evolving world of data.
Table of Contents (19 chapters)
close
close

Exploring columnar data formats

This section goes into the world of data formats, highlighting the significance of understanding each’s benefits. We will explore four widely used columnar data formats, namely Apache Parquet, Apache ORC, Apache Iceberg, and Delta Lake.

Grasping the nuances of these formats is crucial, as their performance and specific use cases vary. For instance, Apache Parquet shines in big data processing frameworks, while Apache ORC excels in high-performance analytics. Similarly, Apache Iceberg is tailored for large-scale data lakes with frequent schema modifications and high concurrency, whereas Delta Lake is optimized for Apache Spark-based applications.

Important note

Columnar data formats are not a new concept. They have been around since the 1970s when they were first proposed by Michael Stonebraker and his colleagues at UC Berkeley. However, they have gained popularity in recent years due to the emergence of big data and analytical workloads...

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY