Hands-On Computer Vision with TensorFlow 2

By : Benjamin Planche, Eliot Andres

3.3 (12)

Buy this Book

Hands-On Computer Vision with TensorFlow 2

3.3 (12)

By: Benjamin Planche, Eliot Andres

Buy this Book

Overview of this book

Computer vision solutions are becoming increasingly common, making their way into fields such as health, automobile, social media, and robotics. This book will help you explore TensorFlow 2, the brand new version of Google's open source framework for machine learning. You will understand how to benefit from using convolutional neural networks (CNNs) for visual tasks. Hands-On Computer Vision with TensorFlow 2 starts with the fundamentals of computer vision and deep learning, teaching you how to build a neural network from scratch. You will discover the features that have made TensorFlow the most widely used AI library, along with its intuitive Keras interface. You'll then move on to building, training, and deploying CNNs efficiently. Complete with concrete code examples, the book demonstrates how to classify images with modern solutions, such as Inception and ResNet, and extract specific content using You Only Look Once (YOLO), Mask R-CNN, and U-Net. You will also build generative adversarial networks (GANs) and variational autoencoders (VAEs) to create and edit images, and long short-term memory networks (LSTMs) to analyze videos. In the process, you will acquire advanced insights into transfer learning, data augmentation, domain adaptation, and mobile and web deployment, among other key concepts. By the end of the book, you will have both the theoretical understanding and practical skills to solve advanced computer vision problems with TensorFlow 2.0.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download and run the example code files

Download the code files

Study and run the experiments

Study the Jupyter notebooks online

Run the Jupyter notebooks on your machine

Run the Jupyter notebooks in Google Colab

Download the color images

Conventions used

Get in touch

Reviews

Free Chapter

Section 1: TensorFlow 2 and Deep Learning Applied to Computer Vision

Computer Vision and Neural Networks

Technical requirements

Computer vision in the wild

Introducing computer vision

Main tasks and their applications

Content recognition

Object classification

Object identification

Object detection and localization

Object and instance segmentation

Pose estimation

Video analysis

Instance tracking

Action recognition

Motion estimation

Content-aware image edition

Scene reconstruction

A brief history of computer vision

First steps to initial successes

Underestimating the perception task

Hand-crafting local features

Adding some machine learning on top

Rise of deep learning

Early attempts and failures

Rise and fall of the perceptron

Too heavy to scale

Reasons for the comeback

The internet – the new El Dorado of data science

More power than ever

Deep learning or the rebranding of artificial neural networks

What makes learning deep?

Deep learning era

Getting started with neural networks

Building a neural network

Imitating neurons

Biological inspiration

Mathematical model

Implementation

Layering neurons together

Mathematical model

Implementation

Applying our network to classification

Setting up the task

Implementing the network

Training a neural network

Learning strategies

Supervised learning

Unsupervised learning

Reinforcement learning

Teaching time

Evaluating the loss

Backpropagating the loss

Teaching our network to classify

Training considerations – underfitting and overfitting

Summary

Questions

Further reading

TensorFlow Basics and Training a Model

Technical requirements

Getting started with TensorFlow 2 and Keras

Introducing TensorFlow

TensorFlow's main architecture

Introducing Keras

A simple computer vision model using Keras

Preparing the data

Building the model

Training the model

Model performance

TensorFlow 2 and Keras in detail

Core concepts

Introducing tensors

TensorFlow graphs

Comparing lazy execution to eager execution

Creating graphs in TensorFlow 2

Introducing TensorFlow AutoGraph and tf.function

Backpropagating errors using the gradient tape

Keras models and layers

Sequential and functional APIs

Callbacks

Advanced concepts

How tf.function works

Variables in TensorFlow 2

Distribution strategies

Using the Estimator API

Available pre-made Estimators

Training a custom Estimator

The TensorFlow ecosystem

TensorBoard

TensorFlow Addons and TensorFlow Extended

TensorFlow Lite and TensorFlow.js

Where to run your model

On a local machine

On a remote machine

On Google Cloud

Summary

Questions

Modern Neural Networks

Technical requirements

Discovering convolutional neural networks

Neural networks for multidimensional data

Problems with fully connected networks

An explosive number of parameters

A lack of spatial reasoning

Introducing CNNs

CNN operations

Convolutional layers

Concept

Properties

Hyperparameters

TensorFlow/Keras methods

Pooling layers

Concept and hyperparameters

TensorFlow/Keras methods

Fully connected layers

Usage in CNNs

TensorFlow/Keras methods

Effective receptive field

Definitions

Formula

CNNs with TensorFlow

Implementing our first CNN

LeNet-5 architecture

TensorFlow and Keras implementations

Application to MNIST

Refining the training process

Modern network optimizers

Gradient descent challenges

Training velocity and trade-off

Suboptimal local minima

A single hyperparameter for heterogeneous parameters

Advanced optimizers

Momentum algorithms

The Ada family

Regularization methods

Early stopping

L1 and L2 regularization

Principles

TensorFlow and Keras implementations

Dropout

Definition

TensorFlow and Keras methods

Batch normalization

Definition

TensorFlow and Keras methods

Summary

Questions

Further reading

Section 2: State-of-the-Art Solutions for Classic Recognition Problems

Influential Classification Tools

Technical requirements

Understanding advanced CNN architectures

VGG – a standard CNN architecture

Overview of the VGG architecture

Motivation

Architecture

Contributions – standardizing CNN architectures

Replacing large convolutions with multiple smaller ones

Increasing the depth of the feature maps

Augmenting data with scale jittering

Replacing fully connected layers with convolutions

Implementations in TensorFlow and Keras

The TensorFlow model

The Keras model

GoogLeNet and the inception module

Overview of the GoogLeNet architecture

Motivation

Architecture

Contributions – popularizing larger blocks and bottlenecks

Capturing various details with inception modules

Using 1 x 1 convolutions as bottlenecks

Pooling instead of fully connecting

Fighting vanishing gradient with intermediary losses

Implementations in TensorFlow and Keras

Inception module with the Keras Functional API

TensorFlow model and TensorFlow Hub

The Keras model

ResNet – the residual network

Overview of the ResNet architecture

Motivation

Architecture

Contributions – forwarding the information more deeply

Estimating a residual function instead of a mapping

Going ultra-deep

Implementations in TensorFlow and Keras

Residual blocks with the Keras Functional API

The TensorFlow model and TensorFlow Hub

The Keras model

Leveraging transfer learning

Overview

Definition

Human inspiration

Motivation

Transferring CNN knowledge

Use cases

Similar tasks with limited training data

Similar tasks with abundant training data

Dissimilar tasks with abundant training data

Dissimilar tasks with limited training data

Transfer learning with TensorFlow and Keras

Model surgery

Removing layers

Grafting layers

Selective training

Restoring pretrained parameters

Freezing layers

Summary

Questions

Further reading

Object Detection Models

Technical requirements

Introducing object detection

Background

Applications

Brief history

Evaluating the performance of a model

Precision and recall

Precision-recall curve

Average precision and mean average precision

Average precision threshold

A fast object detection algorithm – YOLO

Introducing YOLO

Strengths and limitations of YOLO

YOLO's main concepts

Inferring with YOLO

The YOLO backbone

YOLO's layers output

Introducing anchor boxes

How YOLO refines anchor boxes

Post-processing the boxes

NMS

YOLO inference summarized

Training YOLO

How the YOLO backbone is trained

YOLO loss

Bounding box loss

Object confidence loss

Classification loss

Full YOLO loss

Training techniques

Faster R-CNN – a powerful object detection model

Faster R-CNN's general architecture

Stage 1 – Region proposals

Stage 2 – Classification

Faster R-CNN architecture

RoI pooling

Training Faster R-CNN

Training the RPN

The RPN loss

Fast R-CNN loss

Training regimen

TensorFlow Object Detection API

Using a pretrained model

Training on a custom dataset

Summary

Questions

Further reading

Enhancing and Segmenting Images

Technical requirements

Transforming images with encoders-decoders

Introduction to encoders-decoders

Encoding and decoding

Auto-encoding

Purpose

Basic example – image denoising

Simplistic fully connected AE

Application to image denoising

Convolutional encoders-decoders

Unpooling, transposing, and dilating

Transposed convolution (deconvolution)

Unpooling

Upsampling and resizing

Dilated/atrous convolution

Example architectures – FCN and U-Net

Fully convolutional networks

U-Net

Intermediary example – image super-resolution

FCN implementation

Application to upscaling images

Understanding semantic segmentation

Object segmentation with encoders-decoders

Overview

Decoding as label maps

Training with segmentation losses and metrics

Post-processing with conditional random fields

Advanced example – image segmentation for self-driving cars

Task presentation

Exemplary solution

The more difficult case of instance segmentation

From object segmentation to instance segmentation

Respecting boundaries

Post-processing into instance masks

From object detection to instance segmentation – Mask R-CNN

Applying semantic segmentation to bounding boxes

Building an instance segmentation model with Faster-RCNN

Summary

Questions

Further reading

Section 3: Advanced Concepts and New Frontiers of Computer Vision

Training on Complex and Scarce Datasets

Technical requirements

Efficient data serving

Introducing the TensorFlow Data API

Intuition behind the TensorFlow Data API

Feeding fast and data-hungry models

Inspiration from lazy structures

Structure of TensorFlow data pipelines

Extract, Transform, Load

API interface

Setting up input pipelines

Extracting (from tensors, text files, TFRecord files, and more)

From NumPy and TensorFlow data

From files

From other inputs (generator, SQL database, range, and others)

Transforming the samples (parsing, augmenting, and more)

Parsing images and labels

Parsing TFRecord files

Editing samples

Transforming the datasets (shuffling, zipping, parallelizing, and more)

Structuring datasets

Merging datasets

Optimizing and monitoring input pipelines

Following best practices for optimization

Parallelizing and prefetching

Fusing operations

Passing options to ensure global properties

Monitoring and reusing datasets

Aggregating performance statistics

Caching and reusing datasets

How to deal with data scarcity

Augmenting datasets

Overview

Why augment datasets?

Considerations

Augmenting images with TensorFlow

TensorFlow Image module

Example – augmenting images for our autonomous driving application

Rendering synthetic datasets

Overview

Rise of 3D databases

Benefits of synthetic data

Generating synthetic images from 3D models

Rendering from 3D models

Post-processing synthetic images

Problem – realism gap

Leveraging domain adaptation and generative models (VAEs and GANs)

Training models to be robust to domain changes

Supervised domain adaptation

Unsupervised domain adaptation

Domain randomization

Generating larger or more realistic datasets with VAEs and GANs

Discriminative versus generative models

VAEs

GANs

Augmenting datasets with conditional GANs

Summary

Questions

Further reading

Video and Recurrent Neural Networks

Technical requirements

Introducing RNNs

Basic formalism

General understanding of RNNs

Learning RNN weights

Backpropagation through time

Truncated backpropagation

Long short-term memory cells

LSTM general principles

LSTM inner workings

Classifying videos

Applying computer vision to video

Classifying videos with an LSTM

Extracting features from videos

Training the LSTM

Defining the model

Loading the data

Training the model

Summary

Questions

Further reading

Optimizing Models and Deploying on Mobile Devices

Technical requirements

Optimizing computational and disk footprints

Measuring inference speed

Measuring latency

Using tracing tools to understand computational performance

Improving model inference speed

Optimizing for hardware

Optimizing on CPUs

Optimizing on GPUs

Optimizing on specialized hardware

Optimizing input

Optimizing post-processing

When the model is still too slow

Interpolating and tracking

Model distillation

Reducing model size

Quantization

Channel pruning and weight sparsification

On-device machine learning

Considerations of on-device machine learning

Benefits of on-device ML

Latency

Privacy

Cost

Limitations of on-device ML

Practical on-device computer vision

On-device computer vision particularities

Generating a SavedModel

Generating a frozen graph

Importance of preprocessing

Example app – recognizing facial expressions

Introducing MobileNet

Deploying models on-device

Running on iOS devices using Core ML

Converting from TensorFlow or Keras

Loading the model

Using the model

Running on Android using TensorFlow Lite

Converting the model from TensorFlow or Keras

Loading the model

Using the model

Running in the browser using TensorFlow.js

Converting the model to the TensorFlow.js format

Using the model

Running on other devices

Summary

Questions

Migrating from TensorFlow 1 to TensorFlow 2

Automatic migration

Migrating TensorFlow 1 code

Sessions

Placeholders

Variable management

Layers and models

Other concepts

References

Chapter 1: Computer Vision and Neural Networks

Chapter 2: TensorFlow Basics and Training a Model

Chapter 3: Modern Neural Networks

Chapter 4: Influential Classification Tools

Chapter 5: Object Detection Models

Chapter 6: Enhancing and Segmenting Images

Chapter 7: Training on Complex and Scarce Datasets

Chapter 8: Video and Recurrent Neural Networks

Chapter 9: Optimizing Models and Deploying on Mobile Devices

Assessments

Answers

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

3.3 (12)

5 star

33.3%

4 star

25%

3 star

8.3%

2 star

8.3%

1 star

25%

Hands-On Computer Vision with TensorFlow 2

By : Benjamin Planche, Eliot Andres

Hands-On Computer Vision with TensorFlow 2

By: Benjamin Planche, Eliot Andres

Overview of this book

Building an instance segmentation model with Faster-RCNN

Delete Bookmark