Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying The Definitive Guide to Data Integration
  • Table Of Contents Toc
  • Feedback & Rating feedback
The Definitive Guide to Data Integration

The Definitive Guide to Data Integration

By : BONNEFOY, CHAIZE, Raphaël MANSUY, Mehdi TAZI
close
close
The Definitive Guide to Data Integration

The Definitive Guide to Data Integration

By: BONNEFOY, CHAIZE, Raphaël MANSUY, Mehdi TAZI

Overview of this book

The Definitive Guide to Data Integration is an indispensable resource for navigating the complexities of modern data integration. Focusing on the latest tools, techniques, and best practices, this guide helps you master data integration and unleash the full potential of your data. This comprehensive guide begins by examining the challenges and key concepts of data integration, such as managing huge volumes of data and dealing with the different data types. You’ll gain a deep understanding of the modern data stack and its architecture, as well as the pivotal role of open-source technologies in shaping the data landscape. Delving into the layers of the modern data stack, you’ll cover data sources, types, storage, integration techniques, transformation, and processing. The book also offers insights into data exposition and APIs, ingestion and storage strategies, data preparation and analysis, workflow management, monitoring, data quality, and governance. Packed with practical use cases, real-world examples, and a glimpse into the future of data integration, The Definitive Guide to Data Integration is an essential resource for data eclectics. By the end of this book, you’ll have the gained the knowledge and skills needed to optimize your data usage and excel in the ever-evolving world of data.
Table of Contents (19 chapters)
close
close

Influential open source data technologies

Several solutions must be studied at the level of data integration and open source solutions. We can consider solutions such as Hadoop, Spark, and Kafka, or even solutions such as Presto. These different solutions make it possible to support large volumes and thus meet the challenges of today’s companies.

Hadoop and the Hadoop ecosystem

Hadoop leverages the principles of distributed file systems, allowing storage capacity and computing power to be scaled by distributing data and processing across multiple nodes in a cluster.

The Hadoop ecosystem comprises numerous tools and components that enhance and extend its capabilities. These components assist in data ingestion, storage, processing, and analysis. Some of the key components in the Hadoop ecosystem are as follows:

  • HDFS: This is Hadoop’s primary storage layer and offers a fault-tolerant, distributed storage system designed for large datasets. It splits files...

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY