Book Overview & Buying
Table Of Contents
Feedback & Rating

Data-Centric Machine Learning with Python

By : Jonas Christensen, Nakul Bajaj, Manmohan Gosada

4.6 (5)

Buy this Book

Data-Centric Machine Learning with Python

4.6 (5)

By: Jonas Christensen, Nakul Bajaj, Manmohan Gosada

Buy this Book

Overview of this book

In the rapidly advancing data-driven world where data quality is pivotal to the success of machine learning and artificial intelligence projects, this critically timed guide provides a rare, end-to-end overview of data-centric machine learning (DCML), along with hands-on applications of technical and non-technical approaches to generating deeper and more accurate datasets. This book will help you understand what data-centric ML/AI is and how it can help you to realize the potential of ‘small data’. Delving into the building blocks of data-centric ML/AI, you’ll explore the human aspects of data labeling, tackle ambiguity in labeling, and understand the role of synthetic data. From strategies to improve data collection to techniques for refining and augmenting datasets, you’ll learn everything you need to elevate your data-centric practices. Through applied examples and insights for overcoming challenges, you’ll get a roadmap for implementing data-centric ML/AI in diverse applications in Python. By the end of this book, you’ll have developed a profound understanding of data-centric ML/AI and the proficiency to seamlessly integrate common data-centric approaches in the model development lifecycle to unlock the full potential of your machine learning projects by prioritizing data quality and reliability.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Free Chapter

Part 1: What Data-Centric Machine Learning Is and Why We Need It

Chapter 1: Exploring Data-Centric Machine Learning

Understanding data-centric ML

Data-centric versus model-centric ML

The importance of quality data in ML

Summary

References

Chapter 2: From Model-Centric to Data-Centric – ML’s Evolution

Exploring why ML development ended up being mostly model-centric

Unlocking the opportunity for small data ML

Why we need data-centric AI more than ever

Summary

References

Part 2: The Building Blocks of Data-Centric ML

Chapter 3: Principles of Data-Centric ML

Sometimes, all you need is the right data

Principle 1 – data should be the center of ML development

Principle 2 – leverage annotators and SMEs effectively

Principle 3 – use ML to improve your data

Principle 4 – follow ethical, responsible, and well-governed ML practices

Summary

References

Chapter 4: Data Labeling Is a Collaborative Process

Understanding the benefits of diverse human labeling

Understanding common challenges arising from human labelers

Designing a framework for high-quality labels

Measuring labeling consistency

Summary

References

Part 3: Technical Approaches to Better Data

Chapter 5: Techniques for Data Cleaning

The six key dimensions of data quality

Installing the required packages

Introducing the dataset

Ensuring the data is consistent

Checking that the data is unique

Ensuring that the data is complete and not missing

Ensuring that the data is valid

Ensuring that the data is accurate

Ensuring that the data is fresh

Summary

Chapter 6: Techniques for Programmatic Labeling in Machine Learning

Technical requirements

Pattern matching

Database lookup

Boolean flags

Weak supervision

Semi-weak supervision

Slicing functions

Active learning

Transfer learning

Semi-supervised learning

Summary

Chapter 7: Using Synthetic Data in Data-Centric Machine Learning

Understanding synthetic data

Summary

References

Chapter 8: Techniques for Identifying and Removing Bias

The bias conundrum

Types of bias

The data-centric imperative

Case study

Summary

Chapter 9: Dealing with Edge Cases and Rare Events in Machine Learning

Importance of detecting rare events and edge cases in machine learning

Statistical methods

Anomaly detection

Data augmentation and resampling techniques

Cost-sensitive learning

Choosing evaluation metrics

Ensemble techniques

Summary

Part 4: Getting Started with Data-Centric ML

Chapter 10: Kick-Starting Your Journey in Data-Centric Machine Learning

Solving six common ML challenges

Making data everyone’s business – our own experience

Summary

References

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

4.6 (5)

5 star

60%

4 star

40%

3 star

2 star

1 star

Data-Centric Machine Learning with Python

By : Jonas Christensen, Nakul Bajaj, Manmohan Gosada

Data-Centric Machine Learning with Python

By: Jonas Christensen, Nakul Bajaj, Manmohan Gosada

Overview of this book

Designing a framework for high-quality labels

Unlock full access

Continue reading for free

Data-Centric Machine Learning with Python

By : Jonas Christensen, Nakul Bajaj, Manmohan Gosada

Data-Centric Machine Learning with Python

By: Jonas Christensen, Nakul Bajaj, Manmohan Gosada

Overview of this book

Designing a framework for high-quality labels

Unlock full access

Continue reading for free

Delete Bookmark

Confirmation

Buy this book with your credits?