
Building Big Data Pipelines with Apache Beam
By :

This task will be a reimplementation of Task 5 from Chapter 2, Implementing, Testing, and Deploying Basic Pipelines. Again, for clarity, let's restate the problem definition.
Given an input data stream of quadruples (workoutId, gpsLatitude, gpsLongitude, and timestamp), calculate the current speed and total tracked distance. The data comes from a GPS tracker that sends data only when the user starts a sports activity. We can assume that workoutId is unique and contains userId in it.
The caveats of the implementation are the same as what we discussed in the original Task 5, so we'll skip to its Python SDK implementation right away.
The complete implementation can be found in the source code for of this chapter, in chapter6/src/main/python/sport_tracker.py
. The logic is concentrated in two functions – SportTrackerCalc
and computeMetrics
: