-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Science with Python
By :

Solution:
After the glass dataset has been imported, shuffled, and standardized (see Exercise 58):
import pandas as pd
labels_df = pd.DataFrame()
from sklearn.cluster import KMeans
for i in range(0, 100):
model = KMeans(n_clusters=2)
model.fit(scaled_features)
labels = model.labels_
labels_df['Model_{}_Labels'.format(i+1)] = labels
row_mode = labels_df.mode(axis=1)
labels_df['row_mode'] = row_mode
print(labels_df.head(5))
We have drastically increased the confidence in our predictions by iterating through numerous models, saving the predictions at each iteration, and assigning the final predictions as the mode of these predictions. However, these predictions were generated by models using a predetermined number of clusters. Unless we know the number of clusters a priori, we will want to discover the optimal number of clusters to segment our observations.
Solution:
from sklearn.decomposition import PCA
model = PCA(n_components=best_n_components)
df_pca = model.fit_transform(scaled_features)
from sklearn.cluster import KMeans
import numpy as np
inertia_list = []
for i in range(100):
model = KMeans(n_clusters=x)
The value for x will be dictated by the outer loop which is covered in detail here.
model.fit(df_pca)
inertia = model.inertia_
inertia_list.append(inertia)
mean_inertia_list_PCA = []
for x in range(1, 11):
mean_inertia = np.mean(inertia_list)
mean_inertia_list_PCA.append(mean_inertia)
print(mean_inertia_list_PCA)