-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Essential PySpark for Scalable Data Analytics
By :

Being a part of the overarching Hadoop ecosystem, Spark has traditionally been Hive-compliant. While the Hive query language diverges greatly from ANSI SQL standards, Spark 3.0 Spark SQL can be made ANSI SQL-compliant using a spark.sql.ansi.enabled
configuration. With this configuration enabled, Spark SQL uses an ANSI SQL-compliant dialect instead of a Hive dialect.
Even with ANSI SQL compliance enabled, Spark SQL may not entirely conform to ANSI SQL dialect, and in this section, we will explore some of the prominent DDL and DML syntax of Spark SQL.
The syntax for creating a database and a table using Spark SQL is presented as follows:
CREATE DATABASE IF NOT EXISTS feature_store; CREATE TABLE IF NOT EXISTS feature_store.retail_features USING DELTA LOCATION '/FileStore/shared_uploads/delta/retail_features.delta';
In the previous code block, we do the following: