
Building Big Data Pipelines with Apache Beam
By :

We will use the well-known examples, which have mostly been implemented using the Java SDK, from Chapter 2, Implementing, Testing, and Deploying Basic Pipelines, and Chapter 3, Implementing Pipelines Using Stateful Processing. We will also build on our knowledge from Chapter 4, Structuring Code for Reusability, regarding using user-defined PTransforms for better reusability and testing.
Our first complete task will be the task we implemented as Task 2 in Chapter 2, Implementing, Testing, and Deploying Basic Pipelines, but as always, for clarity, we will restate the problem here.
Given an input data stream of lines of text, calculate the longest word found in this stream. Start with an empty word; once a longer word is seen, output the newly found candidate.
From a logical perspective, this problem is the same as in the case of Task 2. So, let's focus...