
Python Machine Learning By Example
By :

Typical tasks in this stage can be summarized into two major categories: data preprocessing and feature engineering.
To begin, data preprocessing usually involves categorical feature encoding, feature scaling, feature selection, and dimensionality reduction.
In general, categorical features are easy to spot, as they convey qualitative information, such as risk level, occupation, and interests. However, it gets tricky if the feature takes on a discreet and countable (limited) number of numerical values, for instance, 1 to 12 representing months of the year, and 1 and 0 indicating true and false.
The key to identifying whether such a feature is categorical or numerical is whether it provides a mathematical or ranking implication; if it does, it is a numerical feature, such as a product rating from 1 to 5; otherwise, it is categorical, such...