
Building Big Data Pipelines with Apache Beam
By :

Let's suppose that our RPC server works best when it processes about 100 input words in a batch. A real-world requirement would probably look different and would be the result of measurements rather than an arbitrary number. However, for the present discussion, let's suppose that this performance characteristic is given. We can then summarize the task as follows.
Use a given RPC service to augment data in an input stream using batched RPCs with batches of a size of about K elements. Also, resolve the batch after a time of (at most) T to avoid a (possibly) infinitely long wait for elements in small batches.
As we can see, we extended the definition of the problem with the introduction of a parameter, T, which will guard the time for which we can buffer the elements waiting for more data.
As already mentioned, we cannot...