
Data Engineering with AWS
By :

As we covered in Chapter 3, The AWS Data Engineer's Toolkit, there are a number of AWS services that can be used for data transformation. We reviewed a number of these services in Chapter 3, The AWS Data Engineer's Toolkit, so make sure to review that chapter, but in this section, we will look more broadly at the different types of data transformation engines.
Apache Spark is an in-memory engine for working with large datasets, providing a mechanism to split a dataset among multiple nodes in a cluster for efficient processing. Spark is an extremely popular engine to use for processing and transforming big datasets, and there are multiple ways to run Spark jobs within AWS.
With Apache Spark, you can either process data in batches (such as on a daily basis or every few hours) or process near real-time streaming data using Spark Streaming. In addition, you can use Spark SQL to process data using standard SQL and Spark...