Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to specify cat_features for the catboost learner? #161

Closed
flippercy opened this issue Aug 12, 2021 · 6 comments
Closed

How to specify cat_features for the catboost learner? #161

flippercy opened this issue Aug 12, 2021 · 6 comments

Comments

@flippercy
Copy link

flippercy commented Aug 12, 2021

Hi:

I am using a customized catboost learner but having difficulties in specifying cat_features.

The codes for the customized learner is:

from catboost import CatBoostClassifier

class MyMonotonicCatboostClassifier(BaseEstimator):

def __init__(self, task = binary:logistic', thread_count = num_cores,  **params):

    super().__init__(task, **params)

    self.estimator_class = CatBoostClassifier

    # convert to int for integer hyperparameters
    self.params = {
        'verbose': False,
        'thread_count': params['thread_count'] if 'thread_count' in params else num_cores,
        'learning_rate': params['learning_rate'],
        'bagging_temperature': int(params['bagging_temperature']),
        'max_depth': int(params['max_depth']),
        'reg_lambda': params['reg_lambda'],
        'min_data_in_leaf': int(params['min_data_in_leaf']),
        'subsample': params['subsample'],
        'colsample_bylevel':params['colsample_bylevel'],
        'n_estimators':int(params['n_estimators']),
        'random_seed': params['random_seed'] if 'random_seed' in params else randomseed,
        "monotone_constraints":params['monotone_constraints'] if 'monotone_constraints' in params else monotone_catboost,   
   }  

If I use CatBoostEstimator from flaml.model as the class, cat_features can be specified in params; however, the final model lost the portability, which means that it cannot be saved by save_model() and ported into R.

If I set the class as self.estimator_class = CatBoostClassifier, I kept getting an error message stating that cat_features is missing.

I am running the codes above in R via reticulate; not sure whether it is related.

Appreciate your help!

Thank you.

@flippercy flippercy reopened this Aug 13, 2021
@sonichi
Copy link
Contributor

sonichi commented Aug 13, 2021

If I use CatBoostEstimator from flaml.model as the class, cat_features can be specified in params; however, the final model lost the portability, which means that it cannot be saved by save_model() and ported into R.

Why can't it be saved by save_model() and how does your customized learner solve that problem?

@flippercy
Copy link
Author

flippercy commented Aug 16, 2021

Hi @sonichi:

Sorry I misunderstood the portability and there is no issue with this aspect now. However, the dilemma I am facing is:

If I build the customized catboost learner as above, I have difficulties specifying cat_features. There is an error message "CatBoostError: features data: pandas.DataFrame column 'V1' has dtype 'category' but is not in cat_features list", in which 'V1" is the first categorical variable in the modeling data.

If I set the class of the customized learner as CatBoostEstimator from flaml.model, everything runs well but as mentioned in another issue, the tunable parameters for catboost in FLAML are very limited; actually just n_estimators and learning_rate. As a result, the final model is not optimized enough.

Any suggestions?

Thank you

@flippercy flippercy reopened this Aug 16, 2021
@sonichi
Copy link
Contributor

sonichi commented Aug 16, 2021

You can set the cat_features in the same way as in flaml:

FLAML/flaml/model.py

Lines 720 to 724 in 10082b9

if isinstance(X_train, pd.DataFrame):
cat_features = list(X_train.select_dtypes(
include='category').columns)
else:
cat_features = []

@flippercy
Copy link
Author

Thank you @sonichi; I will try.

On the other hand, is there any plan to expand the search space of catboost as discussed in #144 ?

@sonichi
Copy link
Contributor

sonichi commented Aug 17, 2021

@stepthom is exploring it. @AlgoAIBoss made some suggestions in that thread. I'll be curious what’s a good search space for catboost. Maybe you can discuss on gitter with them and share with each other what you found. If you have a recommended search space, we can run some benchmark experiments to verify its effectiveness.

@flippercy
Copy link
Author

Got it. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants