-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Hands-On Gradient Boosting with XGBoost and scikit-learn
By :

It's challenging to find real-world datasets that work best with linear models. It's often the case that real data is messy and more complex models like tree ensembles produce better scores. In other cases, linear models may generalize better.
The success of machine learning algorithms depends on how they perform with real-world data. In the next section, we will apply gblinear
to the Diabetes dataset first and then to a linear dataset by construction.
The Diabetes dataset is a regression dataset of 442 diabetes patients provided by scikit-learn. The prediction columns include age, sex, BMI (body mass index), BP (blood pressure), and five serum measurements. The target column is the progression of the disease after 1 year. You can read about the dataset in the original paper here: http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf.
Scikit-learn's datasets are already split into predictor...