n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

dannycg1996 · 2024-07-02T09:51:46Z

Hi all,

The n_estimators value on the best model (automl.model) provided by FLAML does not seem to be set correctly for CatBoostClassifiers.

Example code here:

from flaml import AutoML
from sklearn import datasets

dic_data = datasets.load_iris(as_frame=True)  # numpy arrays
iris_data = dic_data["frame"]  # pandas dataframe data + target
automl = AutoML()
automl_settings = {
    "max_iter":2,
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "catboost_error.log",
    "log_type": "all",
    "estimator_list": ['catboost'],
    "eval_method": "cv",
}
x_train = iris_data[["sepal length (cm)","sepal width (cm)", "petal length (cm)","petal width (cm)"]].to_numpy()
y_train = iris_data['target']
automl.fit(x_train, y_train, **automl_settings)
print(automl.model.get_params())

The print statement logs the following for me:
{'early_stopping_rounds': 10, 'learning_rate': 0.09999999999999996, 'n_estimators': 33, 'thread_count': -1, 'verbose': False, 'random_seed': 10242048, 'task': <flaml.automl.task.generic_task.GenericTask object at 0x7f895f2b3830>, '_estimator_type': 'classifier'}

However, if I look into the actual [catboost_error.log], I can see that neither of the two estimators attempted had n_estimators = 33. They actually had n_estimators = 35 and n_estimators =57. Replicating the FLAML folds myself has shown that this n_estimators value should be 35, meaning that the logs are correct and automl.model is incorrect.

Furthermore, if I run print(automl.model.model.get_all_params()) I get a dictionary which includes iterations=35. The catboost documentation shows that iterations is an alias of n_estimators, and whilst I haven't managed to pin down the exact cause of this issue, I believe it's tied in somewhere here.

In terms of package versions, I'm using FLAML 2.1.2, catboost 1.2.5, scikit-learn 1.5.0 and Python 3.12.0

The text was updated successfully, but these errors were encountered:

Programmer-RD-AI · 2024-07-03T01:59:14Z

Hi,
I will check through with this in the future but check #1275 discussion as well, it seems that they have come across the same issue...
I will try and see through with what the issue is :)
If anyone else can contribute of help out please do, thnx

jmrichardson · 2024-07-27T13:07:18Z

I am getting the same problem with lgbm:

Best hyperparmeter config: {'n_estimators': 1314, 'num_leaves': 6376, 'min_child_samples': 38, 'learning_rate': 0.0988351059982288, 'log_max_bin': 9, 'colsample_bytree': 0.6663805206578503, 'reg_alpha': 0.001100862503118278, 'reg_lambda': 136.83211237673618}
Best r2 on validation data: 0.2975
Training duration of best run: 6365 s
LGBMRegressor(colsample_bytree=0.6663805206578503,
              learning_rate=0.0988351059982288, max_bin=511,
              min_child_samples=38, n_estimators=1, n_jobs=-1, num_leaves=6376,
              reg_alpha=0.001100862503118278, reg_lambda=136.83211237673618,
              verbose=-1)

thinkall · 2024-10-08T03:00:36Z

Hi @dannycg1996 , @jmrichardson , for catboost, we always set the n_estimators to 8192 and apply early stop for the fit function. Early stop could be triggered in lgbm as well.

FLAML/flaml/automl/model.py

Lines 1984 to 1987 in 5c0f18b

 "n_estimators": { 

 "domain": 8192, 

 "init_value": 8192, 

 },

To get more determined result, we'll need to update the model settings. This is related to #1361 .

thinkall added the bug Something isn't working label Aug 7, 2024

dannycg1996 mentioned this issue Oct 14, 2024

[Bug]: FLAML CatBoost Metrics Aren't Reproducible dannycg1996/FLAML_fix#1

Open

dannycg1996 self-assigned this Oct 15, 2024

dannycg1996 linked a pull request Oct 15, 2024 that will close this issue

fix: FLAML catboost metrics arent reproducible #1364

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

dannycg1996 commented Jul 2, 2024 •

edited

Loading

Programmer-RD-AI commented Jul 3, 2024

jmrichardson commented Jul 27, 2024 •

edited

Loading

thinkall commented Oct 8, 2024

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

Comments

dannycg1996 commented Jul 2, 2024 • edited Loading

Programmer-RD-AI commented Jul 3, 2024

jmrichardson commented Jul 27, 2024 • edited Loading

thinkall commented Oct 8, 2024

dannycg1996 commented Jul 2, 2024 •

edited

Loading

jmrichardson commented Jul 27, 2024 •

edited

Loading