
Learning Apache Spark 2
By :

Having seen GraphX over the course of this chapter, have you not wondered what happened to DataFrame? If you are reading/following this book cover to cover, you might be asking yourself why is there a switch between RDD and the DataFrame API? We saw that DataFrame has become the primary API for Spark, and all new optimizations can only be benefitted from if you are using a DataFrame API, so why is there no DataFrame API for GraphX?
Well the reality is that there is a lot of focus on GraphFrames, which is the DataFrame based API for graphs in Spark. There are certain motivations to have a DataFrame based API for Spark and some of these stem from some shortcomings of GraphX.
GraphX poses certain challenges, for example: