Sign In Start Free Trial

Book Overview & Buying
Table Of Contents
Feedback & Rating

Hands-On Ensemble Learning with Python

By : Kyriakides, Margaritis

Hands-On Ensemble Learning with Python

By: Kyriakides, Margaritis

Overview of this book

Ensembling is a technique of combining two or more similar or dissimilar machine learning algorithms to create a model that delivers superior predictive power. This book will demonstrate how you can use a variety of weak algorithms to make a strong predictive model. With its hands-on approach, you'll not only get up to speed with the basic theory but also the application of different ensemble learning techniques. Using examples and real-world datasets, you'll be able to produce better machine learning models to solve supervised learning problems such as classification and regression. In addition to this, you'll go on to leverage ensemble learning techniques such as clustering to produce unsupervised machine learning models. As you progress, the chapters will cover different machine learning algorithms that are widely used in the practical world to make predictions and classifications. You'll even get to grips with the use of Python libraries such as scikit-learn and Keras for implementing different ensemble models. By the end of this book, you will be well-versed in ensemble learning, and have the skills you need to understand which ensemble method is required for which problem, and successfully implement them in real-world scenarios.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Section 1: Introduction and Required Software Tools

Section 1: Introduction and Required Software Tools

A Machine Learning Refresher

A Machine Learning Refresher

Technical requirements

Learning from data

Supervised and unsupervised learning

Performance measures

Machine learning algorithms

Summary

Getting Started with Ensemble Learning

Getting Started with Ensemble Learning

Technical requirements

Bias, variance, and the trade-off

Ensemble learning

Difficulties in ensemble learning

Summary

Section 2: Non-Generative Methods

Section 2: Non-Generative Methods

Voting

Voting

Technical requirements

Hard and soft voting

Python implementation

Using scikit-learn

Summary

Stacking

Stacking

Technical requirements

Meta-learning

Deciding on an ensemble's composition

Python implementation

Summary

Section 3: Generative Methods

Section 3: Generative Methods

Bagging

Bagging

Technical requirements

Bootstrapping

Bagging

Python implementation

Using scikit-learn

Summary

Boosting

Boosting

Technical requirements

AdaBoost

Gradient boosting

Using scikit-learn

XGBoost

Summary

Random Forests

Random Forests

Technical requirements

Understanding random forest trees

Creating forests

Using scikit-learn

Summary

Section 4: Clustering

Section 4: Clustering

Clustering

Clustering

Technical requirements

Consensus clustering

Using OpenEnsembles

Summary

Section 5: Real World Applications

Section 5: Real World Applications

Classifying Fraudulent Transactions

Classifying Fraudulent Transactions

Technical requirements

Getting familiar with the dataset

Exploratory analysis

Voting

Stacking

Bagging

Boosting

Using random forests

Comparative analysis of ensembles

Summary

Predicting Bitcoin Prices

Predicting Bitcoin Prices

Technical requirements

Time series data

Voting

Stacking

Bagging

Boosting

Random forests

Summary

Evaluating Sentiment on Twitter

Evaluating Sentiment on Twitter

Technical requirements

Sentiment analysis tools

Getting Twitter data

Creating a model

Classifying tweets in real time

Summary

Recommending Movies with Keras

Recommending Movies with Keras

Technical requirements

Demystifying recommendation systems

Neural recommendation systems

Using Keras for movie recommendations

Summary

Clustering World Happiness

Clustering World Happiness

Technical requirements

Understanding the World Happiness Report

Creating the ensemble

Gaining insights

Summary

Another Book You May Enjoy

Another Book You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Boosting

As we move on, we will start to utilize generative methods. The first generative method we will experiment with is boosting. We will first try to classify the datasets using AdaBoost. As AdaBoost resamples the dataset based on misclassifications, we expect that it will be able to handle our imbalanced dataset relatively well.

First, we must decide on the ensemble's size. We generate validation curves for a number of ensemble sizes depicted as follows:

Validation curves of various ensemble sizes for AdaBoost

As we can observe, 70 base learners provide the best trade-off between bias and variance. As such, we will proceed with ensembles of size 70.

The following code implements the training and evaluation for AdaBoost:

# --- SECTION 1 ---
# Libraries and data loading
import numpy as np
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection...

Search

Your notes and bookmarks