
Hadoop Beginner's Guide

In discussions concerning integration of Hadoop with other systems, it is easy to think of it as a one-to-one pattern. Data comes out of one system, gets processed in Hadoop, and then is passed onto a third.
Things may be like that on day one, but the reality is more often a series of collaborating components with data flows passing back and forth between them. How we build this complex network in a maintainable fashion is the focus of this chapter.
For the sake of the discussion, we will categorize data into two broad categories:
Network traffic, where data is generated by a system and sent across a network connection
File data, where data is generated by a system and written to files on a filesystem somewhere
We don't assume these data categories are different in any way other than how the data is retrieved.
When we say network data, we mean things like information retrieved from a web server via an HTTP connection, database...
Change the font size
Change margin width
Change background colour