Follow these steps to complete this recipe:
- Import the libraries:
import pandas as pd
import numpy as np
from sklearn import neighbors, metrics
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
- Import the data:
df = spark.sql("select * from ChemicalSensor")
pdf = df.toPandas()
- Encode the values:
label_encoder = LabelEncoder()
integer_encoded = \
label_encoder.fit_transform(pdf['classification'])
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
- Test/train the split data:
X = pdf[feature_cols]
y = onehot_encoded
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.2, random_state=5)
- Train and predict:
clf = DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)
y_pred...