Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

Open
dannycg1996 opened this issue Jul 2, 2024 · 3 comments · May be fixed by #1364
Open

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

dannycg1996 opened this issue Jul 2, 2024 · 3 comments · May be fixed by #1364
Assignees
Labels
bug Something isn't working

Comments

@dannycg1996
Copy link
Collaborator

dannycg1996 commented Jul 2, 2024

Hi all,

The n_estimators value on the best model (automl.model) provided by FLAML does not seem to be set correctly for CatBoostClassifiers.

Example code here:

from flaml import AutoML
from sklearn import datasets

dic_data = datasets.load_iris(as_frame=True)  # numpy arrays
iris_data = dic_data["frame"]  # pandas dataframe data + target
automl = AutoML()
automl_settings = {
    "max_iter":2,
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "catboost_error.log",
    "log_type": "all",
    "estimator_list": ['catboost'],
    "eval_method": "cv",
}
x_train = iris_data[["sepal length (cm)","sepal width (cm)", "petal length (cm)","petal width (cm)"]].to_numpy()
y_train = iris_data['target']
automl.fit(x_train, y_train, **automl_settings)
print(automl.model.get_params())

The print statement logs the following for me:
{'early_stopping_rounds': 10, 'learning_rate': 0.09999999999999996, 'n_estimators': 33, 'thread_count': -1, 'verbose': False, 'random_seed': 10242048, 'task': <flaml.automl.task.generic_task.GenericTask object at 0x7f895f2b3830>, '_estimator_type': 'classifier'}

However, if I look into the actual [catboost_error.log], I can see that neither of the two estimators attempted had n_estimators = 33. They actually had n_estimators = 35 and n_estimators =57. Replicating the FLAML folds myself has shown that this n_estimators value should be 35, meaning that the logs are correct and automl.model is incorrect.

Furthermore, if I run print(automl.model.model.get_all_params()) I get a dictionary which includes iterations=35. The catboost documentation shows that iterations is an alias of n_estimators, and whilst I haven't managed to pin down the exact cause of this issue, I believe it's tied in somewhere here.

In terms of package versions, I'm using FLAML 2.1.2, catboost 1.2.5, scikit-learn 1.5.0 and Python 3.12.0

@Programmer-RD-AI
Copy link
Contributor

Hi,
I will check through with this in the future but check #1275 discussion as well, it seems that they have come across the same issue...
I will try and see through with what the issue is :)
If anyone else can contribute of help out please do, thnx

@jmrichardson
Copy link
Contributor

jmrichardson commented Jul 27, 2024

I am getting the same problem with lgbm:

Best hyperparmeter config: {'n_estimators': 1314, 'num_leaves': 6376, 'min_child_samples': 38, 'learning_rate': 0.0988351059982288, 'log_max_bin': 9, 'colsample_bytree': 0.6663805206578503, 'reg_alpha': 0.001100862503118278, 'reg_lambda': 136.83211237673618}
Best r2 on validation data: 0.2975
Training duration of best run: 6365 s
LGBMRegressor(colsample_bytree=0.6663805206578503,
              learning_rate=0.0988351059982288, max_bin=511,
              min_child_samples=38, n_estimators=1, n_jobs=-1, num_leaves=6376,
              reg_alpha=0.001100862503118278, reg_lambda=136.83211237673618,
              verbose=-1)

@thinkall thinkall added the bug Something isn't working label Aug 7, 2024
@thinkall
Copy link
Collaborator

thinkall commented Oct 8, 2024

Hi @dannycg1996 , @jmrichardson , for catboost, we always set the n_estimators to 8192 and apply early stop for the fit function. Early stop could be triggered in lgbm as well.

FLAML/flaml/automl/model.py

Lines 1984 to 1987 in 5c0f18b

"n_estimators": {
"domain": 8192,
"init_value": 8192,
},

To get more determined result, we'll need to update the model settings. This is related to #1361 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants