
Hands-On Data Analysis with Pandas
By :

Practice building and evaluating machine learning models in scikit-learn
with the following exercises:
a) Combine the red and white wine datasets (data/winequality-red.csv
and data/winequality-white.csv
, respectively) and add a column for the kind of wine (red or white).
b) Perform some initial EDA.
c) Build and fit a pipeline that scales the data and then uses k-means clustering to make two clusters. Be sure not to use the quality
column.
d) Use the Fowlkes-Mallows Index (the fowlkes_mallows_score()
function is in sklearn.metrics
) to evaluate how well k-means is able to make the distinction between red and white wine.
e) Find the center of each cluster.
a) Using the data/stars.csv
file, perform some initial EDA and then build a linear regression model of all the numeric columns to predict the temperature of the star.
b) Train the model on 75% of...