Sign In Start Free Trial

Book Overview & Buying
Table Of Contents
Feedback & Rating

Machine Learning Algorithms

Machine Learning Algorithms

Overview of this book

Machine learning has gained tremendous popularity for its powerful and fast predictions with large datasets. However, the true forces behind its powerful output are the complex algorithms involving substantial statistical analysis that churn large datasets and generate substantial insight. This second edition of Machine Learning Algorithms walks you through prominent development outcomes that have taken place relating to machine learning algorithms, which constitute major contributions to the machine learning process and help you to strengthen and master statistical interpretation across the areas of supervised, semi-supervised, and reinforcement learning. Once the core concepts of an algorithm have been covered, you’ll explore real-world examples based on the most diffused libraries, such as scikit-learn, NLTK, TensorFlow, and Keras. You will discover new topics such as principal component analysis (PCA), independent component analysis (ICA), Bayesian regression, discriminant analysis, advanced clustering, and gaussian mixture. By the end of this book, you will have studied machine learning algorithms and be able to put them into production to make your machine learning applications more innovative.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

A Gentle Introduction to Machine Learning

A Gentle Introduction to Machine Learning

Introduction – classic and adaptive machines

Only learning matters

Beyond machine learning – deep learning and bio-inspired adaptive systems

Machine learning and big data

Summary

Important Elements in Machine Learning

Important Elements in Machine Learning

Data formats

Learnability

Introduction to statistical learning concepts

Class balancing

Elements of information theory

Summary

Feature Selection and Feature Engineering

Feature Selection and Feature Engineering

scikit-learn toy datasets

Creating training and test sets

Managing categorical data

Managing missing features

Data scaling and normalization

Feature selection and filtering

Principal Component Analysis

Independent Component Analysis

Atom extraction and dictionary learning

Visualizing high-dimensional datasets using t-SNE

Summary

Regression Algorithms

Regression Algorithms

Linear models for regression

A bidimensional example

Linear regression with scikit-learn and higher dimensionality

Ridge, Lasso, and ElasticNet

Robust regression

Bayesian regression

Polynomial regression

Isotonic regression

Summary

Linear Classification Algorithms

Linear Classification Algorithms

Linear classification

Logistic regression

Implementation and optimizations

Stochastic gradient descent algorithms

Passive-aggressive algorithms

Finding the optimal hyperparameters through a grid search

Classification metrics

ROC curve

Summary

Naive Bayes and Discriminant Analysis

Naive Bayes and Discriminant Analysis

Bayes' theorem

Naive Bayes classifiers

Naive Bayes in scikit-learn

Discriminant analysis

Summary

Support Vector Machines

Support Vector Machines

Linear SVM

SVMs with scikit-learn

Kernel-based classification

ν-Support Vector Machines

Support Vector Regression

Introducing semi-supervised Support Vector Machines (S3VM)

Summary

Decision Trees and Ensemble Learning

Decision Trees and Ensemble Learning

Binary Decision Trees

Decision Tree classification with scikit-learn

Decision Tree regression

Introduction to Ensemble Learning

Summary

Clustering Fundamentals

Clustering Fundamentals

Clustering basics

k-NN

Gaussian mixture

K-means

Evaluation methods based on the ground truth

Summary

Advanced Clustering

Advanced Clustering

DBSCAN

Spectral Clustering

Online Clustering

Biclustering

Summary

Hierarchical Clustering

Hierarchical Clustering

Hierarchical strategies

Agglomerative Clustering

Summary

Introducing Recommendation Systems

Introducing Recommendation Systems

Naive user-based systems

Content-based systems

Model-free (or memory-based) collaborative filtering

Model-based collaborative filtering

Summary

Introducing Natural Language Processing

Introducing Natural Language Processing

NLTK and built-in corpora

The Bag-of-Words strategy

Part-of-Speech

A sample text classifier based on the Reuters corpus

Summary

Topic Modeling and Sentiment Analysis in NLP

Topic Modeling and Sentiment Analysis in NLP

Topic modeling

Introducing Word2vec with Gensim

Sentiment analysis

Summary

Introducing Neural Networks

Introducing Neural Networks

Deep learning at a glance

MLPs with Keras

Summary

Advanced Deep Learning Models

Advanced Deep Learning Models

Deep model layers

An example of a deep convolutional network with Keras

An example of an LSTM network with Keras

A brief introduction to TensorFlow

Summary

Creating a Machine Learning Architecture

Creating a Machine Learning Architecture

Machine learning architectures

Scikit-learn tools for machine learning architectures

Summary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Creating training and test sets

When a dataset is large enough, it's a good practice to split it into training and test sets, the former to be used for training the model and the latter to test its performances. In the following diagram, there's a schematic representation of this process:

Training/test set split process schema

There are two main rules in performing such an operation:

Both datasets must reflect the original distribution
The original dataset must be randomly shuffled before the split phase in order to avoid a correlation between consequent elements

With scikit-learn, this can be achieved by using the train_test_split() function:

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=1000)

The test_size parameter (as well as training_size) allows you to specify the...

Search

Your notes and bookmarks