Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Optimizing Databricks Workloads
  • Toc
  • feedback
Optimizing Databricks Workloads

Optimizing Databricks Workloads

By : Anirudh Kala, Bhatnagar, Sarbahi
4.1 (13)
close
Optimizing Databricks Workloads

Optimizing Databricks Workloads

4.1 (13)
By: Anirudh Kala, Bhatnagar, Sarbahi

Overview of this book

Databricks is an industry-leading, cloud-based platform for data analytics, data science, and data engineering supporting thousands of organizations across the world in their data journey. It is a fast, easy, and collaborative Apache Spark-based big data analytics platform for data science and data engineering in the cloud. In Optimizing Databricks Workloads, you will get started with a brief introduction to Azure Databricks and quickly begin to understand the important optimization techniques. The book covers how to select the optimal Spark cluster configuration for running big data processing and workloads in Databricks, some very useful optimization techniques for Spark DataFrames, best practices for optimizing Delta Lake, and techniques to optimize Spark jobs through Spark core. It contains an opportunity to learn about some of the real-world scenarios where optimizing workloads in Databricks has helped organizations increase performance and save costs across various domains. By the end of this book, you will be prepared with the necessary toolkit to speed up your Spark jobs and process your data more efficiently.
Table of Contents (13 chapters)
close
1
Section 1: Introduction to Azure Databricks
5
Section 2: Optimization Techniques
10
Section 3: Real-World Scenarios

Learning about ML components in Databricks

The Databricks workspace is broadly divided into two personas: Data Science and Engineering and Machine Learning. We've already looked at the Data Science and Engineering persona. In this chapter, we will understand and work with elements in the Machine Learning persona. This workspace persona consists of additional tabs in the left pane. These include Experiments, Feature Store, and Models. To switch to the Machine Learning workspace, click on Data Science and Engineering in the left pane and select Machine Learning. This brings up the new persona, as illustrated in the following screenshot:

Figure 3.1 – Machine Learning workspace in Databricks

Let's now discuss the three most important elements of this workspace, as follows:

  • Experiments: This section gives access to all the MLflow experiments across the workspace. We will work with this section once we start learning about MLflow.
  • Feature...
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete