Hyperparameter Tuning Guide: Updated Practice and Reference Note
This guide provides a comprehensive list of hyperparameters for tree-based models, organized by their functionality and tailored for effective tuning in machine learning projects.
1. General Workflow for Hyperparameter Tuning
- Understand Your Model Type:
- Bagging models (e.g., Random Forest) focus on independent tree creation.
- Boosting models (e.g., XGBoost, LightGBM, CatBoost) involve sequential learning.
- Key Objectives:
- Prevent overfitting (when the model is too complex).
- Prevent underfitting (when the model is too simple).
- Balance model complexity, generalization, and training time.
- Steps:
- Start with default hyperparameters.
- Identify key parameters to tune based on model type and dataset size.
- Use GridSearchCV or RandomizedSearchCV for systematic exploration.
2. Parameters for Each Model Type
A. Random Forest / Extra Trees / Bagging Models
These models do not use learning_rate
. Focus on parameters related to tree complexity and sampling.
Parameter | Purpose | Example Values |
n_estimators | Number of trees in the ensemble. | [10, 50, 100, 200] |
max_depth | Maximum depth of each tree to control overfitting. | [3, 5, 10, None] |
min_samples_split | Minimum number of samples required to split a node. | [2, 5, 10] |
min_samples_leaf | Minimum number of samples required to be at a leaf node. | [1, 2, 4, 10] |
max_features | Number of features to consider for each split ( 'sqrt' , 'log2' , or a fraction). | ['sqrt', 'log2', None] |
bootstrap | Whether to use bootstrap samples. | [True, False] |
criterion | Splitting criterion for the tree ( 'gini' for classification, 'mse' for regression). | ['gini', 'entropy'] |
B. Gradient Boosting Models (e.g., XGBoost, LightGBM, CatBoost)
These models use learning_rate
and require careful balancing between tree complexity, step size, and regularization.
Parameter | Purpose | Example Values |
n_estimators | Number of boosting iterations (trees). | [100, 200, 300, 500] |
learning_rate | Step size for updating weights of weak learners. | [0.01, 0.05, 0.1, 0.2] |
max_depth | Maximum depth of individual trees. | [3, 5, 7, 10] |
subsample | Fraction of samples used to train each tree (to avoid overfitting). | [0.6, 0.8, 1.0] |
colsample_bytree | Fraction of features to consider for each tree. | [0.6, 0.8, 1.0] |
colsample_bylevel | Fraction of features to consider for each split within a tree. | [0.6, 0.8, 1.0] |
colsample_bynode | Fraction of features to consider at each tree node (XGBoost only). | [0.6, 0.8, 1.0] |
gamma | Minimum loss reduction required to make a further split (regularization). | [0, 1, 5] |
reg_alpha | L1 regularization term for weights (lasso). | [0, 0.1, 1] |
reg_lambda | L2 regularization term for weights (ridge). | [0, 0.1, 1] |
min_child_weight | Minimum sum of instance weights needed in a child node (regularization). | [1, 5, 10] |
tree_method | Tree construction algorithm ( 'auto' , 'exact' , 'approx' , 'hist' , 'gpu_hist' ). | ['auto', 'hist'] |
C. LightGBM-Specific Parameters
LightGBM has unique parameters like num_leaves
and feature bagging.
Parameter | Purpose | Example Values |
num_leaves | Maximum number of leaf nodes in a tree (controls model complexity). | [15, 31, 63, 127] |
max_depth | Maximum depth of the tree. | [3, 5, 10, -1] |
min_data_in_leaf | Minimum number of samples in a leaf to prevent overfitting. | [20, 50, 100] |
feature_fraction | Fraction of features used for each tree. | [0.6, 0.8, 1.0] |
bagging_fraction | Fraction of data used for bagging. | [0.6, 0.8, 1.0] |
bagging_freq | Frequency of bagging (0 means no bagging). | [0, 5, 10] |
lambda_l1 | L1 regularization term on weights. | [0, 0.1, 1] |
lambda_l2 | L2 regularization term on weights. | [0, 0.1, 1] |
boosting_type | Boosting algorithm ( 'gbdt' , 'dart' , 'goss' ). | ['gbdt', 'dart'] |
D. CatBoost-Specific Parameters
CatBoost has specialized handling for categorical features and its own hyperparameters.
Parameter | Purpose | Example Values |
depth | Depth of the tree (similar to max_depth ). | [3, 6, 10] |
iterations | Number of trees (similar to n_estimators ). | [100, 200, 500] |
learning_rate | Step size for gradient boosting. | [0.01, 0.1, 0.2] |
l2_leaf_reg | L2 regularization term on leaf weights. | [3, 5, 10] |
border_count | Number of splits for numerical features. | [32, 64, 128] |
bagging_temperature | Controls the fraction of samples used for bagging. | [0, 1, 5] |
one_hot_max_size | Maximum cardinality for categorical features to use one-hot encoding. | [2, 5, 10] |
random_strength | Controls randomness in feature splits to improve generalization. | [1, 5, 10] |
3. Example Param Grids
Random Forest
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [5, 10, 15],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['sqrt', 'log2', None]
}
XGBoost
param_grid = {
'n_estimators': [100, 200, 300],
'learning_rate': [0.01, 0.1, 0.2],
'max_depth': [3, 5, 7],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0],
'gamma': [0, 1, 5]
}
LightGBM
param_grid = {
'n_estimators': [100, 200, 300],
'learning_rate': [0.01, 0.05, 0.1],
'num_leaves': [31, 63, 127],
'feature_fraction': [0.6, 0.8, 1.0],
'bagging_fraction': [0.6, 0.8, 1.0],
'lambda_l1': [0, 0.1, 1],
'lambda_l2': [0, 0.1, 1]
}
CatBoost
python
Copy code
param_grid = {
'iterations': [100, 200, 500],
'learning_rate': [0.01, 0.1, 0.2],
'depth': [3, 6, 10],
'l2_leaf_reg': [3, 5, 10],
'random_strength': [1, 5, 10]
}
4. Summary
- Start Simple: Use the most impactful parameters (
n_estimators
,max_depth
,learning_rate
). - Regularization: Use
min_samples_split
,min_child_weight
, andlambda_l1
/lambda_l2
for controlling overfitting. - Iterate Gradually: Expand your param grid once you identify promising ranges.
- Specialized Models:
- Random Forest: Focus on
max_depth
,n_estimators
,bootstrap
. - Boosting Models: Balance
learning_rate
,n_estimators
, and regularization.
This guide integrates all major parameters to ensure effective hyperparameter tuning. Let me know if you'd like further additions or clarifications!