-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Essential PySpark for Scalable Data Analytics
By :

Once we have clean and integrated data in the data lake and have trained and built machine learning models at scale, the final step is to convey actionable insights to business owners in a meaningful manner to help them make business decisions. This section covers the business intelligence (BI) and SQL Analytics part of data analytics. It starts with various data visualization techniques using notebooks. Then, it introduces you to Spark SQL to perform business analytics at scale and shows techniques to connect BI and SQL Analysis tools to Apache Spark clusters. The section ends with an introduction to the Data Lakehouse paradigm to bridge the gap between data warehouses and data lakes to provide a single, unified, scalable storage to cater to all aspects of data analytics, including data engineering, data science, and business analytics.
This section includes the following chapters: