Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Developing Kaggle Notebooks
  • Table Of Contents Toc
  • Feedback & Rating feedback
Developing Kaggle Notebooks

Developing Kaggle Notebooks

By : Gabriel Preda
5 (29)
close
close
Developing Kaggle Notebooks

Developing Kaggle Notebooks

5 (29)
By: Gabriel Preda

Overview of this book

Developing Kaggle Notebooks introduces you to data analysis, with a focus on using Kaggle Notebooks to simultaneously achieve mastery in this fi eld and rise to the top of the Kaggle Notebooks tier. The book is structured as a sevenstep data analysis journey, exploring the features available in Kaggle Notebooks alongside various data analysis techniques. For each topic, we provide one or more notebooks, developing reusable analysis components through Kaggle's Utility Scripts feature, introduced progressively, initially as part of a notebook, and later extracted for use across future notebooks to enhance code reusability on Kaggle. It aims to make the notebooks' code more structured, easy to maintain, and readable. Although the focus of this book is on data analytics, some examples will guide you in preparing a complete machine learning pipeline using Kaggle Notebooks. Starting from initial data ingestion and data quality assessment, you'll move on to preliminary data analysis, advanced data exploration, feature qualifi cation to build a model baseline, and feature engineering. You'll also delve into hyperparameter tuning to iteratively refi ne your model and prepare for submission in Kaggle competitions. Additionally, the book touches on developing notebooks that leverage the power of generative AI using Kaggle Models.
Table of Contents (14 chapters)
close
close
12
Other Books You May Enjoy
13
Index

Building a baseline model

From the original temporal data, through feature engineering, we generated time-aggregated features for each time segment in the training data, equal in duration with one test set. For the baseline model demonstrated in this competition, we chose LGBMRegressor, one of the best-performing algorithms at the time of the competition, which, in many cases, had a similar performance to XGBoost. The training data is split using KFold into five splits, and we run training and validation for each fold until we reach the final number of iterations or when the validation error ceases to improve after a specified number of steps (given by the patience parameter). For each split, we then also run the prediction for the test set, with the best model – trained with the current train split for the current fold, that is, with 4/5 from the training set. At the end, we will work out the average of the predictions obtained for each fold. We can use this cross-validation...

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY