📂

Hyper-parameter Tuning Guide: Updated Practice and Reference Note

Hyperparameter Tuning Guide: Updated Practice and Reference Note

This guide provides a comprehensive list of hyperparameters for tree-based models, organized by their functionality and tailored for effective tuning in machine learning projects.

1. General Workflow for Hyperparameter Tuning

Understand Your Model Type:

Bagging models (e.g., Random Forest) focus on independent tree creation.
Boosting models (e.g., XGBoost, LightGBM, CatBoost) involve sequential learning.

Key Objectives:

Prevent overfitting (when the model is too complex).
Prevent underfitting (when the model is too simple).
Balance model complexity, generalization, and training time.

Steps:

Start with default hyperparameters.
Identify key parameters to tune based on model type and dataset size.
Use GridSearchCV or RandomizedSearchCV for systematic exploration.

2. Parameters for Each Model Type

A. Random Forest / Extra Trees / Bagging Models

These models do not use learning_rate. Focus on parameters related to tree complexity and sampling.

Parameter	Purpose	Example Values
`n_estimators`	Number of trees in the ensemble.	`[10, 50, 100, 200]`
`max_depth`	Maximum depth of each tree to control overfitting.	`[3, 5, 10, None]`
`min_samples_split`	Minimum number of samples required to split a node.	`[2, 5, 10]`
`min_samples_leaf`	Minimum number of samples required to be at a leaf node.	`[1, 2, 4, 10]`
`max_features`	Number of features to consider for each split (`'sqrt'`, `'log2'`, or a fraction).	`['sqrt', 'log2', None]`
`bootstrap`	Whether to use bootstrap samples.	`[True, False]`
`criterion`	Splitting criterion for the tree (`'gini'` for classification, `'mse'` for regression).	`['gini', 'entropy']`

B. Gradient Boosting Models (e.g., XGBoost, LightGBM, CatBoost)

These models use learning_rate and require careful balancing between tree complexity, step size, and regularization.

Parameter	Purpose	Example Values
`n_estimators`	Number of boosting iterations (trees).	`[100, 200, 300, 500]`
`learning_rate`	Step size for updating weights of weak learners.	`[0.01, 0.05, 0.1, 0.2]`
`max_depth`	Maximum depth of individual trees.	`[3, 5, 7, 10]`
`subsample`	Fraction of samples used to train each tree (to avoid overfitting).	`[0.6, 0.8, 1.0]`
`colsample_bytree`	Fraction of features to consider for each tree.	`[0.6, 0.8, 1.0]`
`colsample_bylevel`	Fraction of features to consider for each split within a tree.	`[0.6, 0.8, 1.0]`
`colsample_bynode`	Fraction of features to consider at each tree node (XGBoost only).	`[0.6, 0.8, 1.0]`
`gamma`	Minimum loss reduction required to make a further split (regularization).	`[0, 1, 5]`
`reg_alpha`	L1 regularization term for weights (lasso).	`[0, 0.1, 1]`
`reg_lambda`	L2 regularization term for weights (ridge).	`[0, 0.1, 1]`
`min_child_weight`	Minimum sum of instance weights needed in a child node (regularization).	`[1, 5, 10]`
`tree_method`	Tree construction algorithm (`'auto'`, `'exact'`, `'approx'`, `'hist'`, `'gpu_hist'`).	`['auto', 'hist']`

C. LightGBM-Specific Parameters

LightGBM has unique parameters like num_leaves and feature bagging.

Parameter	Purpose	Example Values
`num_leaves`	Maximum number of leaf nodes in a tree (controls model complexity).	`[15, 31, 63, 127]`
`max_depth`	Maximum depth of the tree.	`[3, 5, 10, -1]`
`min_data_in_leaf`	Minimum number of samples in a leaf to prevent overfitting.	`[20, 50, 100]`
`feature_fraction`	Fraction of features used for each tree.	`[0.6, 0.8, 1.0]`
`bagging_fraction`	Fraction of data used for bagging.	`[0.6, 0.8, 1.0]`
`bagging_freq`	Frequency of bagging (0 means no bagging).	`[0, 5, 10]`
`lambda_l1`	L1 regularization term on weights.	`[0, 0.1, 1]`
`lambda_l2`	L2 regularization term on weights.	`[0, 0.1, 1]`
`boosting_type`	Boosting algorithm (`'gbdt'`, `'dart'`, `'goss'`).	`['gbdt', 'dart']`

D. CatBoost-Specific Parameters

CatBoost has specialized handling for categorical features and its own hyperparameters.

Parameter	Purpose	Example Values
`depth`	Depth of the tree (similar to `max_depth`).	`[3, 6, 10]`
`iterations`	Number of trees (similar to `n_estimators`).	`[100, 200, 500]`
`learning_rate`	Step size for gradient boosting.	`[0.01, 0.1, 0.2]`
`l2_leaf_reg`	L2 regularization term on leaf weights.	`[3, 5, 10]`
`border_count`	Number of splits for numerical features.	`[32, 64, 128]`
`bagging_temperature`	Controls the fraction of samples used for bagging.	`[0, 1, 5]`
`one_hot_max_size`	Maximum cardinality for categorical features to use one-hot encoding.	`[2, 5, 10]`
`random_strength`	Controls randomness in feature splits to improve generalization.	`[1, 5, 10]`

3. Example Param Grids

Random Forest

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2', None]
}

XGBoost


param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 7],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'gamma': [0, 1, 5]
}

LightGBM


param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.05, 0.1],
    'num_leaves': [31, 63, 127],
    'feature_fraction': [0.6, 0.8, 1.0],
    'bagging_fraction': [0.6, 0.8, 1.0],
    'lambda_l1': [0, 0.1, 1],
    'lambda_l2': [0, 0.1, 1]
}

CatBoost

python
Copy code
param_grid = {
    'iterations': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.2],
    'depth': [3, 6, 10],
    'l2_leaf_reg': [3, 5, 10],
    'random_strength': [1, 5, 10]
}

4. Summary

Start Simple: Use the most impactful parameters (n_estimators, max_depth, learning_rate).
Regularization: Use min_samples_split, min_child_weight, and lambda_l1/lambda_l2 for controlling overfitting.
Iterate Gradually: Expand your param grid once you identify promising ranges.
Specialized Models:

Random Forest: Focus on max_depth, n_estimators, bootstrap.
Boosting Models: Balance learning_rate, n_estimators, and regularization.

This guide integrates all major parameters to ensure effective hyperparameter tuning. Let me know if you'd like further additions or clarifications!

Choosing the optimal values for hyperparameters like learning_rate and max_depth

key difference between n_estimators and learning_rate

To decide whether to focus on max_depth or learning_rate

The num_leaves parameter is specific to LightGBM