Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter search space for Catboost? #144

Closed
stepthom opened this issue Jul 30, 2021 · 3 comments
Closed

Hyperparameter search space for Catboost? #144

stepthom opened this issue Jul 30, 2021 · 3 comments

Comments

@stepthom
Copy link
Collaborator

The search space for Catboost is rather limited; it only includes early_stopping_rounds and learning_rate:

FLAML/flaml/model.py

Lines 620 to 633 in 072e9e4

def search_space(cls, data_size, **params):
upper = max(min(round(1500000 / data_size), 150), 11)
return {
'early_stopping_rounds': {
'domain': tune.qloguniform(lower=10, upper=upper, q=1),
'init_value': 10,
'low_cost_init_value': 10,
},
'learning_rate': {
'domain': tune.loguniform(lower=.005, upper=.2),
'init_value': 0.1,
},
}

Is there a reason why other hyperparameters are not searched? I was thinking it might be interesting to include:

  • l2_leaf_reg
  • subsample
  • bagging_temperature
  • mvs_reg
  • random_strength
  • max_leaves
  • fold_len_multiplier
  • model_shrink_rate

https://catboost.ai/docs/concepts/python-reference_parameters-list.html

@sonichi
Copy link
Contributor

sonichi commented Jul 30, 2021

We have not tried those. Would you like to explore a different search space?

@stepthom
Copy link
Collaborator Author

stepthom commented Aug 3, 2021

A different search space might yield better results for my current project, yes. I have noticed that the best loss for catboost is always worse than xgboost and lgbm. I was wondering if there was a particular reason that catboost's search space is smaller, but it sounds like there is not. So I will experiment with a different/larger search space, and if I learn anything interesting, I will report back here FYI.

@stepthom stepthom closed this as completed Aug 3, 2021
@AlgoAIBoss
Copy link

If you are interested in checking Catboost hyperparameters here are most of them.

`params = {"iterations": 100, # Default 1000 if decreased learning_rate should be increased. While tuning iteration:hight, learning_rate:low
"learning_rate":0.03,
"depth": 2, # Up to 16 for loss functions except ranking for which up to 8
"loss_function": "CrossEntropy", # "LogLoss", LoglossObjective()
"eval_metric":'Accuracy',
"custom_loss":['Accuracy']
"verbose": False, #Output shows //"Silent":True, "logging_level":'Silent'
"od_type":'Iter', # Early Stopping
"od_wait":40, # Early Stopping
'use_best_model': True, # By default True
"random_seed":42,
"one_hot_max_size":30 # default 3 more categorical features are calculated statistically. but expensive
"early_stopping_rounds":20, # overfitting detector
"bagging_temperature": 1, # assigns weights to values iff "bootstrap_type":"Bayesian"
"bootstrap_type":"Bayesian", # Bernoulli
"nan_mode":'Min', # Min default, Max, Forbidden-does not handle missing values
"task_type": 'GPU',
"max_ctr_complexity":5, # Feature combination default 3 disable it by 1 MAX=Cat_feature size
"boosting_type":"Ordered", # by default "Ordered", "Plain"
"rsm":0.1, # speeds up training and not affect quality. Use only for 100s of features
"border_count":32, # default 128 if GPU used set 32 to speed up training not affecting quality
"leaf_estimation_method":'Newton',
"l2_leaf_reg":3,
'auto_class_weights':'Balanced', # for imbalanced data
'has_time': True, # Time-Series determines datetime and splits accordingly
'combinations_ctr' : ['FloatTargetMeanValue', 'FeatureFreq', 'BinarizedTargetMeanValue', 'Borders', 'Buckets', 'Borders:TargetBorderCount=4', 'Counter:CtrBorderCount=40:Prior=0.5/1'], # Feature engineering supported by GPU
'simple_ctr':['FloatTargetMeanValue', 'FeatureFreq', 'BinarizedTargetMeanValue', 'Borders', 'Buckets', 'Borders:TargetBorderCount=4', 'Counter:CtrBorderCount=40:Prior=0.5/1'],

      }      `

But I would not recommend to tune all of them. Because according to my experience tuning all the parameters will not yield good results. But I would recommend going to this catboost's official tutorial link where they give more information about feature generating hyperparameters that improve the accuracy. This was my first contrabution to the open source comunity. I hope you found it helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants