Serverless Analytics with Amazon Athena

By : Virtuoso, Mert Turkay Hocanin , Wishnick

4.9 (9)

Buy this Book

Serverless Analytics with Amazon Athena

4.9 (9)

By: Virtuoso, Mert Turkay Hocanin , Wishnick

Buy this Book

Overview of this book

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using SQL, without needing to manage any infrastructure. This book begins with an overview of the serverless analytics experience offered by Athena and teaches you how to build and tune an S3 Data Lake using Athena, including how to structure your tables using open-source file formats like Parquet. You’ll learn how to build, secure, and connect to a data lake with Athena and Lake Formation. Next, you’ll cover key tasks such as ad hoc data analysis, working with ETL pipelines, monitoring and alerting KPI breaches using CloudWatch Metrics, running customizable connectors with AWS Lambda, and more. Moving on, you’ll work through easy integrations, troubleshooting and tuning common Athena issues, and the most common reasons for query failure. You will also review tips to help diagnose and correct failing queries in your pursuit of operational excellence. Finally, you’ll explore advanced concepts such as Athena Query Federation and Athena ML to generate powerful insights without needing to touch a single server. By the end of this book, you’ll be able to build and use a data lake with Amazon Athena to add data-driven features to your app and perform the kind of ad hoc data analysis that often precedes many of today’s ML modeling exercises.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1: Fundamentals Of Amazon Athena

Free Chapter

Chapter 1: Your First Query

Technical requirements

What is Amazon Athena?

Obtaining and preparing sample data

Running your first query

Summary

Chapter 2: Introduction to Amazon Athena

Technical requirements

Getting to know Amazon Athena

What is Presto?

Understanding scale and latency

Metering and billing

Connecting and securing

Determining when to use Amazon Athena

Summary

Further reading

Chapter 3: Key Features, Query Types, and Functions

Technical requirements

Running ETL queries

Running approximate queries

Organizing workloads with WorkGroups and saved queries

Using Athena's APIs

Summary

Section 2: Building and Connecting to Your Data Lake

Chapter 4: Metastores, Data Sources, and Data Lakes

Technical requirements

What is a metastore?

What is a data source?

Registering S3 datasets in your metastore

Discovering your datasets on S3 using AWS Glue Crawlers

Designing a data lake architecture

Summary

Chapter 5: Securing Your Data

Technical requirements

General best practices to protect your data on AWS

Encrypting your data and metadata in Glue Data Catalog

Enabling coarse-grained access controls with IAM resource policies for data on S3

Enabling FGACs with Lake Formation for data on S3

Auditing with CloudTrail and S3 access logs

Summary

Further reading

Chapter 6: AWS Glue and AWS Lake Formation

Technical requirements

Summary

Further reading

Section 3: Using Amazon Athena

Chapter 7: Ad Hoc Analytics

Technical requirements

Understanding the ad hoc analytics hype

Building an ad hoc analytics strategy

Using QuickSight with Athena

Using Jupyter Notebooks with Athena

Summary

Chapter 8: Querying Unstructured and Semi-Structured Data

Technical requirements

Why isn't all data structured to begin with?

Querying JSON data

Querying arbitrary log data

Summary

Further reading

Chapter 9: Serverless ETL Pipelines

Technical requirements

Understanding the uses of ETL

Deciding whether to ETL or query in place

Designing ETL queries for Athena

Using Lambda as an orchestrator

Triggering ETL queries with S3 notifications

Summary

Chapter 10: Building Applications with Amazon Athena

Technical requirements

Connecting to Athena

Best practices for connecting to Athena

Securing your application

Optimizing for performance and cost

Summary

Chapter 11: Operational Excellence – Monitoring, Optimization, and Troubleshooting

Technical requirements

Section 4: Advanced Topics

Chapter 12: Athena Query Federation

Technical requirements

What is Query Federation?

How Athena Connectors work

Using pre-built Connectors

Building a custom connector

Summary

Chapter 13: Athena UDFs and ML

Technical requirements

What are UDFs?

Writing a new UDF

Using built-in ML UDFs

Summary

Chapter 14: Lake Formation – Advanced Topics

Reinforcing your data perimeter with Lake Formation

Understanding the benefits of governed tables

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

4.9 (9)

5 star

88.9%

4 star

11.1%

3 star

2 star

1 star

Serverless Analytics with Amazon Athena

By : Virtuoso, Mert Turkay Hocanin , Wishnick

Serverless Analytics with Amazon Athena

By: Virtuoso, Mert Turkay Hocanin , Wishnick

Overview of this book

Discovering your datasets on S3 using AWS Glue Crawlers

How do AWS Glue Crawlers work?

Delete Bookmark