Hyperparameter search space for Catboost? #144

stepthom · 2021-07-30T11:42:18Z

The search space for Catboost is rather limited; it only includes early_stopping_rounds and learning_rate:

Lines 620 to 633 in 072e9e4

 def search_space(cls, data_size, **params): 

 upper = max(min(round(1500000 / data_size), 150), 11) 

 return { 

 'early_stopping_rounds': { 

 'domain': tune.qloguniform(lower=10, upper=upper, q=1), 

 'init_value': 10, 

 'low_cost_init_value': 10, 

 }, 

 'learning_rate': { 

 'domain': tune.loguniform(lower=.005, upper=.2), 

 'init_value': 0.1, 

 }, 

 }

Is there a reason why other hyperparameters are not searched? I was thinking it might be interesting to include:

l2_leaf_reg
subsample
bagging_temperature
mvs_reg
random_strength
max_leaves
fold_len_multiplier
model_shrink_rate

https://catboost.ai/docs/concepts/python-reference_parameters-list.html

The text was updated successfully, but these errors were encountered:

sonichi · 2021-07-30T15:16:56Z

We have not tried those. Would you like to explore a different search space?

stepthom · 2021-08-03T13:38:33Z

A different search space might yield better results for my current project, yes. I have noticed that the best loss for catboost is always worse than xgboost and lgbm. I was wondering if there was a particular reason that catboost's search space is smaller, but it sounds like there is not. So I will experiment with a different/larger search space, and if I learn anything interesting, I will report back here FYI.

AlgoAIBoss · 2021-08-05T07:23:43Z

If you are interested in checking Catboost hyperparameters here are most of them.

`params = {"iterations": 100, # Default 1000 if decreased learning_rate should be increased. While tuning iteration:hight, learning_rate:low
"learning_rate":0.03,
"depth": 2, # Up to 16 for loss functions except ranking for which up to 8
"loss_function": "CrossEntropy", # "LogLoss", LoglossObjective()
"eval_metric":'Accuracy',
"custom_loss":['Accuracy']
"verbose": False, #Output shows //"Silent":True, "logging_level":'Silent'
"od_type":'Iter', # Early Stopping
"od_wait":40, # Early Stopping
'use_best_model': True, # By default True
"random_seed":42,
"one_hot_max_size":30 # default 3 more categorical features are calculated statistically. but expensive
"early_stopping_rounds":20, # overfitting detector
"bagging_temperature": 1, # assigns weights to values iff "bootstrap_type":"Bayesian"
"bootstrap_type":"Bayesian", # Bernoulli
"nan_mode":'Min', # Min default, Max, Forbidden-does not handle missing values
"task_type": 'GPU',
"max_ctr_complexity":5, # Feature combination default 3 disable it by 1 MAX=Cat_feature size
"boosting_type":"Ordered", # by default "Ordered", "Plain"
"rsm":0.1, # speeds up training and not affect quality. Use only for 100s of features
"border_count":32, # default 128 if GPU used set 32 to speed up training not affecting quality
"leaf_estimation_method":'Newton',
"l2_leaf_reg":3,
'auto_class_weights':'Balanced', # for imbalanced data
'has_time': True, # Time-Series determines datetime and splits accordingly
'combinations_ctr' : ['FloatTargetMeanValue', 'FeatureFreq', 'BinarizedTargetMeanValue', 'Borders', 'Buckets', 'Borders:TargetBorderCount=4', 'Counter:CtrBorderCount=40:Prior=0.5/1'], # Feature engineering supported by GPU
'simple_ctr':['FloatTargetMeanValue', 'FeatureFreq', 'BinarizedTargetMeanValue', 'Borders', 'Buckets', 'Borders:TargetBorderCount=4', 'Counter:CtrBorderCount=40:Prior=0.5/1'],

      }      `

But I would not recommend to tune all of them. Because according to my experience tuning all the parameters will not yield good results. But I would recommend going to this catboost's official tutorial link where they give more information about feature generating hyperparameters that improve the accuracy. This was my first contrabution to the open source comunity. I hope you found it helpful.

stepthom closed this as completed Aug 3, 2021

flippercy mentioned this issue Aug 16, 2021

How to specify cat_features for the catboost learner? #161

Closed

qingyun-wu mentioned this issue Aug 19, 2021

In 'Catboost' estimator, entire parameters are not able to obtained. #169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter search space for Catboost? #144

Hyperparameter search space for Catboost? #144

stepthom commented Jul 30, 2021

sonichi commented Jul 30, 2021

stepthom commented Aug 3, 2021

AlgoAIBoss commented Aug 5, 2021

Hyperparameter search space for Catboost? #144

Hyperparameter search space for Catboost? #144

Comments

stepthom commented Jul 30, 2021

sonichi commented Jul 30, 2021

stepthom commented Aug 3, 2021

AlgoAIBoss commented Aug 5, 2021