The approach described in this section, image preprocessing into batches of files, relies on the ND4J FileBatch class (https://static.javadoc.io/org.nd4j/nd4j-common/1.0.0-beta3/org/nd4j/api/loader/FileBatch.html), which is available starting from the 1.0.0-beta3 release of that library. This class can store the raw content of multiple files in byte arrays (one per file), including their original paths. A FileBatch object can be stored to disk in ZIP format. This can reduce the number of disk reads that are required (because of fewer files) and network transfers when reading from remote storage (because of the ZIP compression). Typically, the original image files that are used to train a CNN make use of an efficient (in terms of space and network) compression format (such as JPEG or PNG). But when it comes to a cluster, there is the need to minimize disk reads...

Hands-On Deep Learning with Apache Spark
By :

Hands-On Deep Learning with Apache Spark
By:
Overview of this book
Deep learning is a subset of machine learning where datasets with several layers of complexity can be processed. Hands-On Deep Learning with Apache Spark addresses the sheer complexity of technical and analytical parts and the speed at which deep learning solutions can be implemented on Apache Spark.
The book starts with the fundamentals of Apache Spark and deep learning. You will set up Spark for deep learning, learn principles of distributed modeling, and understand different types of neural nets. You will then implement deep learning models, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) on Spark.
As you progress through the book, you will gain hands-on experience of what it takes to understand the complex datasets you are dealing with. During the course of this book, you will use popular deep learning frameworks, such as TensorFlow, Deeplearning4j, and Keras to train your distributed models.
By the end of this book, you'll have gained experience with the implementation of your models on a variety of use cases.
Table of Contents (19 chapters)
Preface
The Apache Spark Ecosystem
Deep Learning Basics
Extract, Transform, Load
Streaming
Convolutional Neural Networks
Recurrent Neural Networks
Training Neural Networks with Spark
Monitoring and Debugging Neural Network Training
Interpreting Neural Network Output
Deploying on a Distributed System
NLP Basics
Textual Analysis and Deep Learning
Convolution
Image Classification
What's Next for Deep Learning?
Other Books You May Enjoy
Appendix A: Functional Programming in Scala
Appendix B: Image Data Preparation for Spark
How would like to rate this book
Customer Reviews