
Apache Hive Cookbook
By :

Hive provides the following set of analytical functions:
RANK
DENSE_RANK
ROW_NUMBER
PERCENT_RANK
CUME_DIST
NTILE
Common and useful sets of analytical functions are ranking functions where rows from resultset are ranked according to a scheme.
Let's analyze each function in detail. We will be using the same sales
dataset and applying analytical functions to it:
ROW_NUMBER
: This function will provide a unique number to each row in resultset based on the ORDER BY
clause within the PARTITION
. For example, if we want to assign row_number
to each fname
, which is also partitioned by IP address in the sales
dataset, the query would be:
hive> select fname,ip,ROW_NUMBER() OVER (ORDER BY ip ) as rownum from sales;
RANK
: It is similar to ROW_NUMBER
, but the equal rows are ranked with the same number. For example, if we use RANK
in the previous query instead of ROW_NUM
:
hive> select fname,ip,RANK() OVER (ORDER BY ip) as ranknum, RANK() OVER (PARTITION BY ip order...
Change the font size
Change margin width
Change background colour