
Learning Hadoop 2
By :

In the previous chapters, we explored a number of APIs for data processing. MapReduce, Spark, Tez and Samza are rather low-level, and writing non-trivial business logic with them often requires significant Java development. Moreover, different users will have different needs. It might be impractical for an analyst to write MapReduce code or build a DAG of inputs and outputs to answer some simple queries. At the same time, a software engineer or a researcher might want to prototype ideas and algorithms using high-level abstractions before jumping into low-level implementation details.
In this chapter and the following one, we will explore some tools that provide a way to process data on HDFS using higher-level abstractions. In this chapter we will explore Apache Pig, and, in particular, we will cover the following topics: