Resilient Distributed Datasets (RDDs) are the fundamental object used in Apache Spark. RDDs are immutable collections representing datasets and have the inbuilt capability of reliability and failure recovery. By nature, RDDs create new RDDs upon any operation such as transformation or action. They also store the lineage, which is used to recover from failures. We have also seen in the previous chapter some details about how RDDs can be created and what kind of operations can be applied to RDDs.
The following is a simply example of the RDD lineage:

Let's start looking at the simplest RDD again by creating a RDD from a sequence of numbers:
scala> val rdd_one = sc.parallelize(Seq(1,2,3,4,5,6))
rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[28] at parallelize at <console>:25
scala> rdd_one.take(100)
res45: Array[Int] = Array(1, 2, 3, 4...