📂

Hyper-parameter Tuning Guide: Updated Practice and Reference Note

Hyperparameter Tuning Guide: Updated Practice and Reference Note

This guide provides a comprehensive list of hyperparameters for tree-based models, organized by their functionality and tailored for effective tuning in machine learning projects.

1. General Workflow for Hyperparameter Tuning

  1. Understand Your Model Type:
    • Bagging models (e.g., Random Forest) focus on independent tree creation.
    • Boosting models (e.g., XGBoost, LightGBM, CatBoost) involve sequential learning.
  2. Key Objectives:
    • Prevent overfitting (when the model is too complex).
    • Prevent underfitting (when the model is too simple).
    • Balance model complexity, generalization, and training time.
  3. Steps:
    • Start with default hyperparameters.
    • Identify key parameters to tune based on model type and dataset size.
    • Use GridSearchCV or RandomizedSearchCV for systematic exploration.

2. Parameters for Each Model Type

A. Random Forest / Extra Trees / Bagging Models

These models do not use learning_rate. Focus on parameters related to tree complexity and sampling.

Parameter
Purpose
Example Values
n_estimators
Number of trees in the ensemble.
[10, 50, 100, 200]
max_depth
Maximum depth of each tree to control overfitting.
[3, 5, 10, None]
min_samples_split
Minimum number of samples required to split a node.
[2, 5, 10]
min_samples_leaf
Minimum number of samples required to be at a leaf node.
[1, 2, 4, 10]
max_features
Number of features to consider for each split ('sqrt''log2', or a fraction).
['sqrt', 'log2', None]
bootstrap
Whether to use bootstrap samples.
[True, False]
criterion
Splitting criterion for the tree ('gini' for classification, 'mse' for regression).
['gini', 'entropy']

B. Gradient Boosting Models (e.g., XGBoost, LightGBM, CatBoost)

These models use learning_rate and require careful balancing between tree complexity, step size, and regularization.

Parameter
Purpose
Example Values
n_estimators
Number of boosting iterations (trees).
[100, 200, 300, 500]
learning_rate
Step size for updating weights of weak learners.
[0.01, 0.05, 0.1, 0.2]
max_depth
Maximum depth of individual trees.
[3, 5, 7, 10]
subsample
Fraction of samples used to train each tree (to avoid overfitting).
[0.6, 0.8, 1.0]
colsample_bytree
Fraction of features to consider for each tree.
[0.6, 0.8, 1.0]
colsample_bylevel
Fraction of features to consider for each split within a tree.
[0.6, 0.8, 1.0]
colsample_bynode
Fraction of features to consider at each tree node (XGBoost only).
[0.6, 0.8, 1.0]
gamma
Minimum loss reduction required to make a further split (regularization).
[0, 1, 5]
reg_alpha
L1 regularization term for weights (lasso).
[0, 0.1, 1]
reg_lambda
L2 regularization term for weights (ridge).
[0, 0.1, 1]
min_child_weight
Minimum sum of instance weights needed in a child node (regularization).
[1, 5, 10]
tree_method
Tree construction algorithm ('auto''exact''approx''hist''gpu_hist').
['auto', 'hist']

C. LightGBM-Specific Parameters

LightGBM has unique parameters like num_leaves and feature bagging.

Parameter
Purpose
Example Values
num_leaves
Maximum number of leaf nodes in a tree (controls model complexity).
[15, 31, 63, 127]
max_depth
Maximum depth of the tree.
[3, 5, 10, -1]
min_data_in_leaf
Minimum number of samples in a leaf to prevent overfitting.
[20, 50, 100]
feature_fraction
Fraction of features used for each tree.
[0.6, 0.8, 1.0]
bagging_fraction
Fraction of data used for bagging.
[0.6, 0.8, 1.0]
bagging_freq
Frequency of bagging (0 means no bagging).
[0, 5, 10]
lambda_l1
L1 regularization term on weights.
[0, 0.1, 1]
lambda_l2
L2 regularization term on weights.
[0, 0.1, 1]
boosting_type
Boosting algorithm ('gbdt''dart''goss').
['gbdt', 'dart']

D. CatBoost-Specific Parameters

CatBoost has specialized handling for categorical features and its own hyperparameters.

Parameter
Purpose
Example Values
depth
Depth of the tree (similar to max_depth).
[3, 6, 10]
iterations
Number of trees (similar to n_estimators).
[100, 200, 500]
learning_rate
Step size for gradient boosting.
[0.01, 0.1, 0.2]
l2_leaf_reg
L2 regularization term on leaf weights.
[3, 5, 10]
border_count
Number of splits for numerical features.
[32, 64, 128]
bagging_temperature
Controls the fraction of samples used for bagging.
[0, 1, 5]
one_hot_max_size
Maximum cardinality for categorical features to use one-hot encoding.
[2, 5, 10]
random_strength
Controls randomness in feature splits to improve generalization.
[1, 5, 10]

3. Example Param Grids

Random Forest

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2', None]
}

XGBoost


param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 7],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'gamma': [0, 1, 5]
}

LightGBM


param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.05, 0.1],
    'num_leaves': [31, 63, 127],
    'feature_fraction': [0.6, 0.8, 1.0],
    'bagging_fraction': [0.6, 0.8, 1.0],
    'lambda_l1': [0, 0.1, 1],
    'lambda_l2': [0, 0.1, 1]
}

CatBoost

python
Copy code
param_grid = {
    'iterations': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.2],
    'depth': [3, 6, 10],
    'l2_leaf_reg': [3, 5, 10],
    'random_strength': [1, 5, 10]
}

4. Summary

  1. Start Simple: Use the most impactful parameters (n_estimatorsmax_depthlearning_rate).
  2. Regularization: Use min_samples_splitmin_child_weight, and lambda_l1/lambda_l2 for controlling overfitting.
  3. Iterate Gradually: Expand your param grid once you identify promising ranges.
  4. Specialized Models:
    • Random Forest: Focus on max_depthn_estimatorsbootstrap.
    • Boosting Models: Balance learning_raten_estimators, and regularization.

This guide integrates all major parameters to ensure effective hyperparameter tuning. Let me know if you'd like further additions or clarifications!

Choosing the optimal values for hyperparameters like learning_rate and max_depthChoosing the optimal values for hyperparameters like learning_rate and max_depthkey difference between n_estimators and learning_ratekey difference between n_estimators and learning_rateTo decide whether to focus on max_depth or learning_rateTo decide whether to focus on max_depth or learning_rateThe num_leaves parameter is specific to LightGBMThe num_leaves parameter is specific to LightGBM