-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Machine Learning with LightGBM and Python
By :

In this section, we compare the performance of LightGBM, XGBoost, and TabTransformers on two different datasets. We also look at more data preparation techniques for unbalanced classes, missing values, and categorical data.
The first dataset we use is the Census Income dataset, which predicts whether personal income will exceed $50,000 based on attributes such as education, marital status, occupation, and others [4]. The dataset has 48,842 instances, and as we’ll see, some missing values and unbalanced classes.
The dataset is available from the following URL: https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data. The data has already been split into a training set and a test set. Once loaded, we can sample the data:
train_data.sample(5)[["age", "education", "marital_status", "hours_per_week", "income_bracket"]]
The data...