Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Cloud Scale Analytics with Azure Data Services
  • Toc
  • feedback
Cloud Scale Analytics with Azure Data Services

Cloud Scale Analytics with Azure Data Services

By : Borosch
4.9 (7)
close
Cloud Scale Analytics with Azure Data Services

Cloud Scale Analytics with Azure Data Services

4.9 (7)
By: Borosch

Overview of this book

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality. This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs. By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.
Table of Contents (20 chapters)
close
1
Section 1: Data Warehousing and Considerations Regarding Cloud Computing
4
Section 2: The Storage Layer
7
Section 3: Cloud-Scale Data Integration and Data Transformation
14
Section 4: Data Presentation, Dashboarding, and Distribution

Integrating data with Synapse Spark pools

If you are a Spark developer and want to use Synapse Spark to wrangle and load your data into your dedicated SQL pools, this is quite an easy thing to accomplish.

JDBC was, and still is, the way to establish the connection and the exchange. There is one caveat regarding the use of JDBC; only interact with the dedicated SQL pools. It will only talk to the control node of your dedicated pool. This is a suboptimal way as both Spark, but also dedicated SQL pools, have a lot of parallelism to offer.

Microsoft adjusted the JDBC driver slightly to benefit from the parallel workers that are part of this game. The JDBC driver will establish a connection between the control node of the dedicated SQL pool and the driver node of the Spark cluster. The Spark engine will issue CETAS statements and send filters and projections over this channel. The data itself will otherwise be exchanged using PolyBase and the Data Lake storage that is attached to...

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete