
Serverless Analytics with Amazon Athena
By :

In Chapter 1, Your First Query, we used TABLESAMPLE
to run a query that allowed us to get familiar with our data by viewing an evenly distributed sampling of rows from across the entire table. TABLESAMPLE
enables you to approximate the results of any query by sampling the underlying data. Athena also supports more targeted forms of approximation that offer bounded error. For example, the approx_distinct
function should produce results with a standard error of 2.3% but completes its execution 97% faster while also using less peak memory than its completely accurate counterpart, COUNT(DISTINCT x)
. We'll learn more about these and several other approximate query tools by exploring our NYC taxi ride tables.
TABLESAMPLE
is a somewhat generic technique for running approximate queries. Unlike the other methods we discuss in this section, TABLESAMPLE
works by sampling the input data. This allows you to use it in conjunction with any other SQL features supported...