Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Mastering Java for Data Science
  • Table Of Contents Toc
  • Feedback & Rating feedback
Mastering Java for Data Science

Mastering Java for Data Science

By : Alexey Grigorev
5 (1)
close
close
Mastering Java for Data Science

Mastering Java for Data Science

5 (1)
By: Alexey Grigorev

Overview of this book

Java is the most popular programming language, according to the TIOBE index, and it is a typical choice for running production systems in many companies, both in the startup world and among large enterprises. Not surprisingly, it is also a common choice for creating data science applications: it is fast and has a great set of data processing tools, both built-in and external. What is more, choosing Java for data science allows you to easily integrate solutions with existing software, and bring data science into production with less effort. This book will teach you how to create data science applications with Java. First, we will revise the most important things when starting a data science application, and then brush up the basics of Java and machine learning before diving into more advanced topics. We start by going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, and deep learning and big data. Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings.
Table of Contents (11 chapters)
close
close
Free Chapter
1
Data Science Using Java
In Progress | 0 / 5 sections completed | 0%
3
Exploratory Data Analysis
In Progress | 0 / 4 sections completed | 0%
4
Supervised Learning - Classification and Regression
In Progress | 0 / 6 sections completed | 0%
5
Unsupervised Learning - Clustering and Dimensionality Reduction
In Progress | 0 / 4 sections completed | 0%
6
Working with Text - Natural Language Processing and Information Retrieval
In Progress | 0 / 4 sections completed | 0%
7
Extreme Gradient Boosting
In Progress | 0 / 4 sections completed | 0%
8
Deep Learning with DeepLearning4J
In Progress | 0 / 4 sections completed | 0%
9
Scaling Data Science
In Progress | 0 / 5 sections completed | 0%
10
Deploying Data Science Models
In Progress | 0 / 4 sections completed | 0%

What this book covers

Chapter 1, Data Science Using Java, provides the overview of the existing tools available in Java as well and introduces the methodology for approaching Data Science projects, CRISP-DM. In this chapter, we also introduce our running example, building a search engine.

Chapter 2, Data Processing Toolbox, reviews the standard Java library: the Collection API for storing the data in memory, the IO API for reading and writing the data, and the Streaming API for a convenient way of organizing data processing pipelines. We will look at the extensions to the standard libraries such as Apache Commons Lang, Apache Commons IO, Google Guava, and AOL Cyclops React. Then, we will cover most common ways of storing the data--text and CSV files, HTML, JSON, and SQL Databases, and discuss how we can get the data from these data sources. We will finish this chapter by talking about the ways we can collect the data for the running example--the search engine, and how we prepare the data for that.

Chapter 3, Exploratory Data Analysis, performs the initial analysis of data with Java: we look at how to calculate common statistics such as the minimal and maximal values, the average value, and the standard deviation. We also talk a bit about interactive analysis and see what are the tools that allow us to visually inspect the data before building models. For the illustration in this chapter, we use the data we collect for the search engine.

Chapter 4, Supervised Learning - Classification and Regression, starts with Machine Learning, and then looks at the models for performing supervised learning in Java. Among others, we look at how to use the following libraries--Smile, JSAT, LIBSVM, LIBLINEAR, and Encog, and we see how we can use these libraries to solve the classification and regression problems. We use two examples here, first, we use the search engine data for predicting whether a URL will appear on the first page of results or not, which we use for illustrating the classification problem. Second, we predict how much time it takes to multiply two matrices on certain hardware given its characteristics, and we illustrate the regression problem with this example.

Chapter 5, Unsupervised Learning – Clustering and Dimensionality Reduction, explores the methods for Dimensionality Reduction available in Java, and we will learn how to apply PCA and Random Projection to reduce the dimensionality of this data. This is illustrated with the hardware performance dataset from the previous chapter. We also look at different ways to cluster data, including Agglomerative Clustering, K-Means, and DBSCAN, and we use the dataset with customer complaints as an example.

Chapter 6, Working with Text – Natural Language Processing and Information Retrieval, looks at how to use text in Data Science applications, and we learn how to extract more useful features for our search engine. We also look at Apache Lucene, a library for full-text indexing and searching, and Stanford CoreNLP, a library for performing Natural Language Processing. Next, we look at how we can represent words as vectors, and we learn how to build such embeddings from co-occurrence matrices and how to use existing ones like GloVe. We also look at how we can use machine learning for texts, and we illustrate it with a sentiment analysis problem where we apply LIBLINEAR to classify if a review is positive or negative.

Chapter 7, Extreme Gradient Boosting, covers how to use XGBoost in Java and tries to apply it to two problems we had previously, classifying whether the URL appears on the first page and predicting the time to multiply two matrices. Additionally, we look at how to solve the learning-to-rank problem with XGBoost and again use our search engine example as illustration.

Chapter 8, Deep Learning with DeepLearning4j, covers Deep Neural Networks and DeepLearning4j, a library for building and training these networks in Java. In particular, we talk about Convolutional Neural Nets and see how we can use them for image recognition--predicting whether it is a picture of a dog or a cat. Additionally, we discuss data augmentation--the way to generate more data, and also mention how we can speed up the training using GPUs. We finish the chapter by describing how to rent a GPU server on Amazon AWS.

Chapter 9, Scaling Data Science, talks about big data tools available in Java, Apache Hadoop, and Apache Spark. We illustrate it by looking at how we can process Common Crawl--the copy of the Internet, and calculate TF-IDF of each document there. Additionally, we look at the graph processing tools available in Apache Spark and build a recommendation system for scientists, we recommend a coauthor for the next possible paper.

Chapter 10, Deploying Data Science Models, looks at how we can expose the models to the rest of the world in such a way they are usable. Here we cover Spring Boot and talk how we can use the search engine model we developed to rank the articles from Common Crawl. We finish by discussing the ways to evaluate the performance of the models in the online settings and talk about A/B tests and Multi-Armed Bandits.

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY