Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data-Centric Machine Learning with Python
  • Table Of Contents Toc
  • Feedback & Rating feedback
Data-Centric Machine Learning with Python

Data-Centric Machine Learning with Python

By : Jonas Christensen, Nakul Bajaj, Manmohan Gosada
4.6 (5)
close
close
Data-Centric Machine Learning with Python

Data-Centric Machine Learning with Python

4.6 (5)
By: Jonas Christensen, Nakul Bajaj, Manmohan Gosada

Overview of this book

In the rapidly advancing data-driven world where data quality is pivotal to the success of machine learning and artificial intelligence projects, this critically timed guide provides a rare, end-to-end overview of data-centric machine learning (DCML), along with hands-on applications of technical and non-technical approaches to generating deeper and more accurate datasets. This book will help you understand what data-centric ML/AI is and how it can help you to realize the potential of ‘small data’. Delving into the building blocks of data-centric ML/AI, you’ll explore the human aspects of data labeling, tackle ambiguity in labeling, and understand the role of synthetic data. From strategies to improve data collection to techniques for refining and augmenting datasets, you’ll learn everything you need to elevate your data-centric practices. Through applied examples and insights for overcoming challenges, you’ll get a roadmap for implementing data-centric ML/AI in diverse applications in Python. By the end of this book, you’ll have developed a profound understanding of data-centric ML/AI and the proficiency to seamlessly integrate common data-centric approaches in the model development lifecycle to unlock the full potential of your machine learning projects by prioritizing data quality and reliability.
Table of Contents (17 chapters)
close
close
Free Chapter
1
Part 1: What Data-Centric Machine Learning Is and Why We Need It
4
Part 2: The Building Blocks of Data-Centric ML
7
Part 3: Technical Approaches to Better Data
10
Chapter 7: Using Synthetic Data in Data-Centric Machine Learning
13
Part 4: Getting Started with Data-Centric ML

Ensuring that the data is fresh

Data freshness is another important aspect of measuring data quality that has an impact on the quality and robustness of machine learning applications. Let’s imagine that we have a machine learning application that’s been trained on 2019 and 2020 customer behavior and utilized to predict hotel room bookings up to April 2021. Maybe January and February numbers were quite accurate, but when March and April hit, accuracy dropped. This might have been due to COVID-19, something that was unseen by the data, and its effects were not captured. In machine learning, this is called data drift. This is happening here; the data distribution in March and April was quite different from the data distribution in 2019 and 2020. By ensuring that the data is fresh and up to date, we can train the model more regularly or as soon as data drift is detected.

To measure data drift, we will use the alibi Python package. However, there are more extensive Python...

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY