Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Python Feature Engineering Cookbook
  • Toc
  • feedback
Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

By : Galli
3.6 (9)
close
Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

3.6 (9)
By: Galli

Overview of this book

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code. Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you’ll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You’ll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains. By the end of this book, you’ll have discovered tips and practical solutions to all of your feature engineering problems.
Table of Contents (13 chapters)
close

Distinguishing variable distribution

A probability distribution is a function that describes the likelihood of obtaining the possible values of a variable. There are many well-described variable distributions, such as the normal, binomial, or Poisson distributions. Some machine learning algorithms assume that the independent variables are normally distributed. Other models make no assumptions about the distribution of the variables, but a better spread of these values may improve their performance. In this recipe, we will learn how to create plots to distinguish the variable distributions in the entire dataset by using the Boston House Prices dataset from scikit-learn.

Getting ready

How to do it...

Let's begin by importing the necessary libraries:

  1. Import the required Python libraries and modules:
import pandas as pd
import matplotlib.pyplot as plt
  1. Load the Boston House Prices dataset from scikit-learn:
from sklearn.datasets import load_boston
boston_dataset = load_boston()
boston = pd.DataFrame(boston_dataset.data,
columns=boston_dataset.feature_names)
  1. Visualize the variable distribution with histograms: 
boston.hist(bins=30, figsize=(12,12), density=True)
plt.show()

The output of the preceding code is shown in the following screenshot:

Most of the numerical variables in the dataset are skewed.

How it works...

In this recipe, we used pandas hist() to plot the distribution of all the numerical variables in the Boston House Prices dataset from scikit-learn. To load the data, we imported the dataset from scikit-learn datasets and then used load_boston() to load the data. Next, we captured the data into a dataframe using pandas DataFrame(), indicating that the data is stored in the data attribute and the variable names in the feature_names attribute.

To display the histograms of all the numerical variables, we used pandas hist(), which calls matplotlib.pyplot.hist() on each variable in the dataframe, resulting in one histogram per variable. We indicated the number of intervals for the histograms using the bins argument, adjusted the figure size with figsize, and normalized the histogram by setting density to TrueIf the histogram is normalized, the sum of the area under the curve is 1.

See also

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete