
Scala for Machine Learning, Second Edition
By :

Before getting your hands dirty, you need to download and deploy the minimum set of tools and libraries; there is no need to reinvent the wheel, after all. A few key components have to be installed in order to compile and run the source code described throughout this book. We will focus on open source and commonly available libraries, although you are invited to experiment with the equivalent tools of your choice. The learning curve for the frameworks described here is minimal.
The code described in the book has been tested with JDK 1.7.0_45 and JDK 1.8.0_25 on Windows x64 and MacOS X x64. You need to install the Java Development Kit if you have not already done so. Finally, the environment variables JAVA_HOME
, PATH
, and CLASSPATH
have to be updated accordingly.
The code has been tested with Scala 2.11.4 and 2.11.8. We recommend using Scala version 2.11.4 or higher with SBT 0.13.1 or higher. Let's assume that the Scala runtime (REPL
) and libraries have been properly installed and that the environment variables SCALA_HOME
, and PATH
have been updated.
The Scala standard library can be downloaded as binaries or as part of the Typesafe Activator tool by visiting http://www.scala-lang.org/download/.
The description and installation instructions for the Eclipse Scala IDE version 4.0 and higher is available at http://scala-ide.org/docs/user/gettingstarted.html.
You can also download the IntelliJ IDEA Scala plugin version 13 or higher from the JetBrains website at http://confluence.jetbrains.com/display/SCA/.
The ubiquitous Simple Build Tool (SBT) will be our primary building engine. It can be downloaded as part of the Typesafe activator or directly from http://www.scala-sbt.org/download.html.
The syntax of the build file sbt/build.sbt
conforms to version 0.13 and is used to compile and assemble the source code presented throughout this book. To build Scala for machine learning, do the following:
-Xmx4096m -Xms512m -XX:MaxPermSize=512m
)$(ROOT)/sbt clean publish-local
$(ROOT)/sbt clean package
$(ROOT)/sbt doc
$(ROOT)/sbt test:doc
$(ROOT)/sbt scalastyle
$(ROOT)/sbt test:compile
Apache Commons Math is a Java library for numerical processing, algebra, statistics, and optimization [1:6].
This is a lightweight library that provides developers with a foundation of small, ready-to-use Java classes that can be easily weaved into a machine learning problem. The examples used throughout the book require version 3.5 or higher.
The math library supports the following:
For more information, visit http://commons.apache.org/proper/commons-math.
We need Apache Public License 2.0; the terms are available at https://www.apache.org/licenses/LICENSE-2.0.
The installation and deployment of the Apache Commons Math library are quite simple. The steps are as follows:
.jar
files in the binary section, commons-math3-3.6-bin.zip
(for version 3.6, for instance)..jar
file.commons-math3-3.6.jar
to the CLASSPATH
, as follows:export CLASSPATH=$CLASSPATH:/Commons_Math_path /commons-math3-3.6.jar
Go to System property | Advanced system settings | Advanced | Environment variables and then edit the entry CLASSPATH
variable.
commons-math3-3.6.jar
file to your IDE environment if needed:Project
| Properties
| Java Build Path
| Libraries
| Add External JARs
File
| Project Structure
| Project Settings
| Libraries
| the source commons-math3-3.6-src.zip
from the source
section.
JFreeChart is an open source chart and plotting java library widely used in the Java programmer community. It was originally created by David Gilbert [1:8].
The library supports a variety of configurable plots and charts (scatter, dial, pie, area, bar, box and whisker, stacked, and 3D). We use JFreeChart to display the output of data processing and algorithm throughout the book, but you are encouraged to explore this great library on your own, as time permits.
It is distributed under the terms of the GNU Lesser General Public License (LGPL), which permits its use in proprietary applications.
To install and deploy JFreeChart, perform the following steps:
.jar
file.jfreechart-1.0.17.jar
(for version 1.0.17) to the CLASSPATH
, as follows:export CLASSPATH=$CLASSPATH:/JFreeChart_path/jfreechart-1.0.17.jar
Go to System property | Advanced system settings | Advanced | Environment variables and then edit the entry CLASSPATH
variable.
jfreechart-1.0.17.jar
file to your IDE environment:Project
| Properties
| Java Build Path
| Libraries
| Add External JARs
File
| Project Structure
| Project Settings
| Libraries
| +
Libraries and tools that are specific to a single chapter are introduced along with the topic. Scalable frameworks are presented in the last chapter along with instructions for downloading them. Libraries related to the conditional random fields and support vector machines are described in their respective chapters.
Why aren't we using Scala algebra and Scala numerical libraries?
Libraries such as Breeze, ScalaNLP, and Algebird are interesting Scala frameworks for linear algebra, numerical analysis, and machine learning. They provide even the most seasoned Scala programmer with a high-quality layer of abstraction. However, this book is designed as a tutorial that allows developers to write algorithms from the ground up using existing or legacy java libraries [1:9].