Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Deep Learning with fastai Cookbook
  • Table Of Contents Toc
  • Feedback & Rating feedback
Deep Learning with fastai Cookbook

Deep Learning with fastai Cookbook

By : Ryan
4.5 (15)
close
close
Deep Learning with fastai Cookbook

Deep Learning with fastai Cookbook

4.5 (15)
By: Ryan

Overview of this book

fastai is an easy-to-use deep learning framework built on top of PyTorch that lets you rapidly create complete deep learning solutions with as few as 10 lines of code. Both predominant low-level deep learning frameworks, TensorFlow and PyTorch, require a lot of code, even for straightforward applications. In contrast, fastai handles the messy details for you and lets you focus on applying deep learning to actually solve problems. The book begins by summarizing the value of fastai and showing you how to create a simple 'hello world' deep learning application with fastai. You'll then learn how to use fastai for all four application areas that the framework explicitly supports: tabular data, text data (NLP), recommender systems, and vision data. As you advance, you'll work through a series of practical examples that illustrate how to create real-world applications of each type. Next, you'll learn how to deploy fastai models, including creating a simple web application that predicts what object is depicted in an image. The book wraps up with an overview of the advanced features of fastai. By the end of this fastai book, you'll be able to create your own deep learning applications using fastai. You'll also have learned how to use fastai to prepare raw datasets, explore datasets, train deep learning models, and deploy trained models.
Table of Contents (10 chapters)
close
close

Cleaning up raw datasets with fastai

Now that we have explored a variety of datasets that are curated by fastai, there is one more topic left to cover in this chapter: how to clean up datasets with fastai. Cleaning up datasets includes dealing with missing values and converting categorical values into numeric identifiers. We need to apply these cleanup steps to datasets because deep learning models can only be trained with numeric data. If we try to train the model with datasets that contain non-numeric data, including missing values and alphanumeric identifiers in categorical columns, the training process will fail. In this section, we are going to review the facilities provided by fastai to make it easy to clean up datasets, and thus make the datasets ready to train deep learning models.

Getting ready

Ensure you have followed the steps in Chapter 1, Getting Started with fastai, to get a fastai environment set up. Confirm that you can open the cleaning_up_datasets.ipynb notebook in the ch2 directory of your repository.

How to do it…

In this section, you will be running through the cleaning_up_datasets.ipynb notebook to address missing values in the ADULT_SAMPLE dataset and replace categorical values with numeric identifiers.

Once you have the notebook open in your fastai environment, complete the following steps:

  1. Run the first two cells to import the necessary libraries and set up the notebook for fastai.
  2. Recall the Examining tabular datasets with fastai section of this chapter. When you checked to see which columns in the ADULT_SAMPLE dataset had missing values, you found that some columns did indeed have missing values. We are going to identify the columns in ADULT_SAMPLE that have missing values, and use the facilities of fastai to apply transformations to the dataset that deal with the missing values in those columns, and then replace those categorical values with numeric identifiers.
  3. First, let's ingest the ADULT_SAMPLE curated dataset again:
    path = untar_data(URLs.ADULT_SAMPLE)
  4. Now, create a pandas DataFrame for the dataset and check for the number of missing values in each column. Note which columns have missing values:
    df = pd.read_csv(path/'adult.csv')
    df.isnull().sum()
  5. To deal with these missing values (and prepare categorical columns), we will use the fastai TabularPandas class (https://docs.fast.ai/tabular.core.html#TabularPandas). To use this class, we need to prepare the following parameters:

    a) procs is the list of transformations that will be applied to TabularPandas. Here, we will specify that we want missing values to be filled (FillMissing) and that we will replace values in categorical columns with numeric identifiers (Categorify).

    b) dep_var specifies which column is the dependent variable; that is, the target that we want to ultimately predict with the model. In the case of ADULT_SAMPLE, the dependent variable is salary.

    c) cont and cat are lists of the columns in the dataset. They are continuous and categorical, respectively. Continuous columns contain numeric values, such as integers or floating-point values. Categorical values contain category identifiers, such as names of US states, days of the week, or colors. We use the cont_cat_split() (https://docs.fast.ai/tabular.core.html#cont_cat_split) function to automatically identify the continuous and categorical columns:

    procs = [FillMissing,Categorify]
    dep_var = 'salary'
    cont,cat = cont_cat_split(df, 1, dep_var=dep_var)
  6. Now, create a TabularPandas object called df_no_missing using these parameters. This object will contain the dataset with missing values replaced and the values in the categorical columns replaced with numeric identifiers:
    df_no_missing = TabularPandas(df, procs, cat, cont, y_names = dep_var)
  7. Apply the show API to df_no_missing to display samples of its contents. Note that the values in the categorical columns are maintained when the object is displayed using show(). What about replacing the categorical values with numeric identifiers? Don't worry – we'll see that result in the next step:
    Figure 2.21 – The first few records of df_no_missing

    Figure 2.21 – The first few records of df_no_missing

  8. Now, display some sample contents of df_no_missing using the items.head() API. This time, the categorical columns contain the numeric identifiers rather than the original values. This is an example of a benefit provided by fastai: the switch between the original categorical values and the numeric identifiers is handled elegantly. If you need to see the original values, you can use the show() API, which transforms the numeric values in categorical columns back into their original values, while the items.head() API shows the actual numeric identifiers in the categorical columns:
    Figure 2.22 – The first few records of df_no_missing with numeric identifiers in categorical columns

    Figure 2.22 – The first few records of df_no_missing with numeric identifiers in categorical columns

  9. Finally, let's confirm that the missing values were handled correctly. As you can see, the two columns that originally had missing values no longer have missing values in df_no_missing:
Figure 2.23 – Missing values in df_no_missing

Figure 2.23 – Missing values in df_no_missing

By following these steps, you have seen how fastai makes it easy to prepare a dataset to train a deep learning model. It does this by replacing missing values and converting the values in the categorical columns into numeric identifiers.

How it works…

In this section, you saw several ways that fastai makes it easy to perform common data preparation steps. The TabularPandas class provides a lot of value by making it easy to execute common steps to prepare a tabular dataset (including replacing missing values and dealing with categorical columns). The cont_cat_split() function automatically identifies continuous and categorical columns in your dataset. In conclusion, fastai makes the cleanup process easy and less error prone than it would be if you had to hand code all the functions required to accomplish these dataset cleanup steps.

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY