-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

The Data Science Workshop
By :

In the previous section, we looked at the undersampling method, where we downsized the majority class to make the dataset balanced. However, when undersampling, we reduced the size of the dataset. In many circumstances, downsizing the dataset can have adverse effects on the predictive power of the classifier. An effective way to counter the downsizing of the dataset is to oversample the minority class. Oversampling is done by generating new synthetic data points similar to those of the minority class, thereby balancing the dataset.
Two very popular methods for generating such synthetic points are:
The way the SMOTE
algorithm generates synthetic data is by looking at the neighborhood of minority classes and generating new data points within the neighborhood:
Figure 13.20: Dataset with two classes
Let's explain the concept of generating...