Sign In Start Free Trial

Book Overview & Buying
Table Of Contents
Feedback & Rating

Mastering Machine Learning with scikit-learn

By : Gavin Hackeling

5 (2)

Mastering Machine Learning with scikit-learn

5 (2)

By: Gavin Hackeling

Overview of this book

Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model. This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn’s API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model’s performance. By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.

Preface

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

The Fundamentals of Machine Learning

The Fundamentals of Machine Learning

Defining machine learning

Learning from experience

Machine learning tasks

Training data, testing data, and validation data

Bias and variance

An introduction to scikit-learn

Installing scikit-learn

Installing pandas, Pillow, NLTK, and matplotlib

Summary

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Evaluating the model

Summary

Classification and Regression with k-Nearest Neighbors

Classification and Regression with k-Nearest Neighbors

K-Nearest Neighbors

Lazy learning and non-parametric models

Classification with KNN

Regression with KNN

Summary

Feature Extraction

Feature Extraction

Extracting features from categorical variables

Standardizing features

Extracting features from text

Extracting features from images

Summary

From Simple Linear Regression to Multiple Linear Regression

From Simple Linear Regression to Multiple Linear Regression

Multiple linear regression

Polynomial regression

Regularization

Applying linear regression

Gradient descent

Summary

From Linear Regression to Logistic Regression

From Linear Regression to Logistic Regression

Binary classification with logistic regression

Spam filtering

Tuning models with grid search

Multi-class classification

Multi-label classification and problem transformation

Summary

Naive Bayes

Naive Bayes

Bayes' theorem

Generative and discriminative models

Naive Bayes

Naive Bayes with scikit-learn

Summary

Nonlinear Classification and Regression with Decision Trees

Nonlinear Classification and Regression with Decision Trees

Decision trees

Training decision trees

Decision trees with scikit-learn

Summary

From Decision Trees to Random Forests and Other Ensemble Methods

From Decision Trees to Random Forests and Other Ensemble Methods

Bagging

Boosting

Stacking

Summary

The Perceptron

The Perceptron

The perceptron

Limitations of the perceptron

Summary

From the Perceptron to Support Vector Machines

From the Perceptron to Support Vector Machines

Kernels and the kernel trick

Maximum margin classification and support vectors

Classifying characters in scikit-learn

Summary

From the Perceptron to Artificial Neural Networks

From the Perceptron to Artificial Neural Networks

Nonlinear decision boundaries

Feed-forward and feedback ANNs

Multi-layer perceptrons

Training multi-layer perceptrons

Summary

K-means

K-means

Clustering

K-means

Evaluating clusters

Image quantization

Clustering to learn features

Summary

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis

Principal component analysis

Visualizing high-dimensional data with PCA

Face recognition with PCA

Summary

Customer Reviews

5 (2)

5 star

100%

4 star

0

3 star

0

2 star

0

1 star

0

Summary

In this chapter, we defined machine learning as the design of programs that can improve their performance at a task by learning from experience. We discussed the spectrum of supervision in experience. At one end is supervised learning, in which a program learns from inputs that are labeled with their corresponding outputs. Unsupervised learning, in which the program must discover structure in only unlabeled inputs, is at the opposite end of the spectrum. Semi-supervised approaches make use of both labeled and unlabeled training data.

Next we discussed common types of machine learning tasks and reviewed examples of each. In classification tasks the program predict the value of a discrete response variable from the observed explanatory variables. In regression tasks the program must predict the value of a continuous response variable from the explanatory variables. Unsupervised learning tasks include clustering, in which observations are organized into groups according to some similarity measure, and dimensionality reduction, which reduces a set of explanatory variables to a smaller set of synthetic features that retain as much information as possible. We also reviewed the bias-variance trade-off and discussed common performance measures for different machine learning tasks.

In this chapter we discussed the history, goals, and advantages of scikit-learn. Finally, we prepared our development environment by installing scikit-learn and other libraries that are commonly used in conjunction with it. In the next chapter we will discuss a simple model for regression tasks, and build our first machine learning model with scikit-learn.

Search

Your notes and bookmarks