In this recipe, we used logistic regression. Logistic regression is a technique that can be used for traditional statistics as well as machine learning. Due to its simplicity and power, many data scientists use logistic regression as their first model and use it as a benchmark to beat. Logistic regression is a binary classifier, meaning it can classify something as true or false. In our case, the classifications are benign or malignant.
First, we import koalas for data manipulation and sklearn for our model and analysis. Next, we import data from our data table and put it into a Pandas DataFrame. Then we split the data into testing and training datasets. Next, we create a formula that will describe for the model the data columns being used. Next, we give the model the formula, the training dataset, and the algorithm it will use. We then output a model that we can use to evaluate new data. We now create a DataFrame called predictions_nominal, which we can use to compare...