
Modern Scala Projects
By :

This section starts by laying out the implementation infrastructure for Chapter 4, Building a Spam Classification Pipeline. The goal of this section will be to get started on developing one data pipeline to analyze the flight-on-time dataset. The first step is to set up prerequisites, before implementation. That is the goal of the next subsection.
The following prerequisites or prerequisite checks are recommended. A new prerequisite on this list is MongoDB:
build.sbt
fileWe start by detailing the steps to increase the memory available to the Spark application. Why would we want to do that? This and other points related to Java heap space memory are explored in the following topic.
Flight on-time records, compiled over a period of time, say, month by month, become big or...