
Building Big Data Pipelines with Apache Beam
By :

In this chapter, we will introduce some elementary pipelines written using Beam's Java Software Development Kit (SDK).
We will use the code located in the GitHub repository for this book: https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam.
We will also need the following tools to be installed:
JAVA_HOME
set appropriatelyImportant note
Although it is possible to run many tools in this book using the Windows shell, we will focus on using Bash scripting only. We hope Windows users will be able to run Bash using virtualization or Windows Subsystem for Linux (or any similar technology).
First of all, we need to clone the repository:
$ git clone https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam.git
Building-Big-Data-Pipelines-with-Apache-Beam
, being created in the working directory. We then run the following command in this newly created directory:$ ./mvnw clean install
Throughout this book, the $
character will denote a Bash shell. Therefore, $ ./mvnw clean install
would mean to run the ./mvnw
command in the top-level directory of the git clone
(that is, Building-Big-Data-Pipelines-with-Apache-Beam
). By using chapter1$ ../mvnw clean install
, we mean to run the specified command in the subdirectory called chapter1
.