Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Debugging Machine Learning Models with Python
  • Toc
  • feedback
Debugging Machine Learning Models with Python

Debugging Machine Learning Models with Python

By : Ali Madani
4.9 (16)
close
Debugging Machine Learning Models with Python

Debugging Machine Learning Models with Python

4.9 (16)
By: Ali Madani

Overview of this book

Debugging Machine Learning Models with Python is a comprehensive guide that navigates you through the entire spectrum of mastering machine learning, from foundational concepts to advanced techniques. It goes beyond the basics to arm you with the expertise essential for building reliable, high-performance models for industrial applications. Whether you're a data scientist, analyst, machine learning engineer, or Python developer, this book will empower you to design modular systems for data preparation, accurately train and test models, and seamlessly integrate them into larger technologies. By bridging the gap between theory and practice, you'll learn how to evaluate model performance, identify and address issues, and harness recent advancements in deep learning and generative modeling using PyTorch and scikit-learn. Your journey to developing high quality models in practice will also encompass causal and human-in-the-loop modeling and machine learning explainability. With hands-on examples and clear explanations, you'll develop the skills to deliver impactful solutions across domains such as healthcare, finance, and e-commerce.
Table of Contents (26 chapters)
close
1
Part 1:Debugging for Machine Learning Modeling
5
Part 2:Improving Machine Learning Models
10
Part 3:Low-Bug Machine Learning Development and Deployment
15
Part 4:Deep Learning Modeling
19
Part 5:Advanced Topics in Model Debugging

Types of machine learning modeling

Machine learning contains multiple modeling types that may rely on output data, a variable type of model output, and learning from prerecorded data or experience. Although the examples in this book focus on supervised learning, we will review other types of modeling, including unsupervised learning, self-supervised learning, semi-supervised learning, reinforcement learning (RL), and generative machine learning to cover the six major categories of machine learning modeling (Figure 1.2). We will also talk about techniques in machine learning modeling and provide code examples that are not parallel to these categories, such as active learning, transfer learning, ensemble learning, and deep learning:

Figure 1.2 – Types of machine learning modeling

Figure 1.2 – Types of machine learning modeling

Self-supervised and semi-supervised learning are sometimes considered sub-categories of supervised learning. However, we will separate them here so that we can establish the differences between the usual supervised learning models you are familiar with and these two types of modeling.

Supervised learning

Supervised learning is about identifying the relationship between inputs/features and the output for each data point. But what do input and output mean?

Imagine that we want to build a machine learning model to predict whether a person is likely to get breast cancer or not. The output of the model could be 1 for getting breast cancer and 0 for not getting breast cancer and the inputs could be the characteristics of the people, such as their age, weight, and whether they smoke or not. There could even be inputs that are measured using advanced technologies, such as the genetic information of each person. In this case, we want to use our machine learning model to predict which patient will get cancer in the future.

You can also design a machine learning model to estimate the price of houses in a city. Here, your model could use characteristics of houses, such as the number of bedrooms and size of the house, the neighborhood, and access to schools, to estimate house prices.

In both of these examples, we have models trying to identify patterns within input features, such as a high number of bedrooms but only one bathroom, and associate those with the output. Depending on the output variable type, your model can be categorized as a classification model, in which the output is categorical, such as getting or not getting cancer, or a regression model, in which the output is continuous, such as house prices.

Unsupervised learning

The majority of our life, at least in childhood, has been spent using our five senses (eyesight, hearing, taste, touch, and smell) to collect information about our surroundings, food, and so on, without us trying to find supervised learning style relationships such as whether a banana is ripe or not based on its color and shape. Similarly, in unsupervised learning, we are not seeking to identify the relationship between the features (input) and the output. Instead, the goal is to identify relationships between data points, as in clustering, extract new features (that is, embeddings or representations), and, if needed, reduce the dimensionality (that is, the number of features) of our data without using any output for the data points.

Self-supervised learning

The third category of machine learning modeling is called self-supervised learning. In this category, the goal is to identify the relationship between inputs and outputs, but the difference with supervised learning is the source of outputs. For example, if the goal of the supervised machine learning model is to translate from English to French, the inputs come from English words and sentences and the outputs come from French words and sentences. However, we can have a self-supervised learning model within English sentences to try to predict the next word or a missing word in a sentence. For example, let’s say we aim to recognize that “talking” is a good candidate to fill the gap in “Jack is ____ with Julie.” Self-supervised learning models have been used in recent years across different fields to identify new features. This is commonly called representation learning. We will talk about some examples of self-supervised learning in Chapter 14, Introduction to Recent Advancements in Machine Learning.

Semi-supervised learning

Semi-supervised learning can help us benefit from supervised learning without throwing out the data points that don’t have output values. Sometimes, we have data points for which we don’t have the output values and only their feature values are available. In such cases, semi-supervised learning helps us use data points with or without output. One simple process to do so is to group data points that are similar to each other and use known outputs of the data points in each group to assign output for other data points of the same group that don’t have output value.

Reinforcement learning

In RL, a model is rewarded according to its experience in an environment (real or virtual). In other words, RL is about identifying relationships with piecewise example addition. In RL, data is not considered part of the model and is independent of the model itself. We will go through some details of RL in Chapter 14, Introduction to Recent Advancements in Machine Learning.

Generative machine learning

Generative machine learning modeling helps us develop models that can generate images, text, or any data point that is close to the probability distribution of data provided in the training process. ChatGPT is one of the most famous tools that’s built on top of a generative model to generate realistic and meaningful text in response to user questions and answers (https://openai.com/blog/chatgpt). We will go through more details about generative modeling and the available tools built on top of it in Chapter 14, Introduction to Recent Advancements in Machine Learning.

In this section, we provided a brief review of the basic components for building machine learning models and different types of modeling. But if you want to develop machine learning models for automation or discovery, for healthcare or any other application, with a low or high number of data points, on your laptop or the cloud, using a central processing unit (CPU) or graphics processing unit (GPU), you need to develop high-quality code that works as expected. Although this book is not a software debugging book, an overview of software debugging challenges and techniques could help you in developing your machine learning models.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete