
Learning Apache Spark 2
By :

A pipeline is a sequence of stages and each stage is either a Transformer or an Estimator. The stages are run in a sequence in a way that the input frame is transformed as it passes through each stage of the process:
transform()
method on the DataFrame
fit()
method on the DataFrame
A pipeline is created by declaring its stages, configuring appropriate parameters, and then chaining them in a pipeline object. For example, if we were to create a simple classification pipeline we would tokenize the data into columns, use the hashing term feature extractor to extract features, and then build a logistic regression model.
Please ensure that you add Apache Spark ML Jar either in the class path or build that when you are doing the initial build.
This pipeline can be built as follows using the Scala API:
import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.linalg...