fix: FLAML catboost metrics arent reproducible #1364

dannycg1996 · 2024-10-15T09:44:14Z

Note: Please don't merge without a discussion first

Why are these changes needed?

Currently it is impossible to reproduce the best_loss provided by FLAML, using the best model which FLAML provides - when that best model is a CatBoost model.

There are two important changes made in this PR:

Removal of self.params[self.ITER_HP] = self._model.tree_count_ line on model.py
Removal of this if statement from model.py: if current_time + self._time_per_iter > self.deadline: return False

The line highlighted in 1) is the cause of issue #1317. As I understand it, CatBoost estimators can be thought of as performing their own AutoML/optimisation process internally - they have early stopping, and when use_best_model = True they internally optimise the objective metric. The issue with the current code is that it overwrites the n_estimators value with the actual number of trees used, but doesn't switch off this AutoML functionality. Rather than providing the user with the correct model, we simply change the internal AutoML process. As such, we get different results when we retrain and test that model on the same folds.

is a little trickier, and needs some discussion. Essentially the callbacks change the behaviour of the models, when the time_budget remaining is less than the time which the model will take to train. This means that when we take the model FLAML provides, and refit it on the same folds but with no time budget involved, we get different behaviour and different results. I imagine that this callback was likely implemented as a means of ensuring that FLAML respects the time budget, but I personally think that results which aren't reproducible are not particularly useful.

My main issue with 2) is that the XGBoostEstimator also utilises this same check_resource_limits(), so this change could have unintended consequences. On a similar note, I've realised that XGBoostEstimator and CatBoostEstimators call check_resource_limits differently i.e.:

class XGBoostResourceLimit(BaseResourceLimit, TrainingCallback):
    def after_iteration(self, model, epoch, evals_log) -> bool:
        now = time.time()
        return not self.check_resource_limits(now, epoch, "xgb")


class CatBoostResourceLimit(BaseResourceLimit):
    def after_iteration(self, info) -> bool:
        now = time.time()
        return self.check_resource_limits(now, info.iteration, "cat")

Could you please explain why XGBoostResourceLimit returns return not self.check_resource_limits(now, epoch, "xgb") whilst CatBoostResourceLimit returns self.check_resource_limits(now, info.iteration, "cat") please?

It's the not part which I'm interested in.

Also worth mentioning is that we still assign self._time_per_iter in check_resource_limits. Can this code also be deleted, or is it still of use somewhere?

I'm fairly confident now that CatBoost results are now reproducible, but please let me know if you have any issues with my code.

Related issue number

Closes #1317

Checks

I've used pre-commit to lint the changes in this PR (note the same in integrated in our CI checks).
I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…odels

Daniel Grindrod added 2 commits October 14, 2024 21:28

fix: CatBoostRegressors metrics are now reproducible

83bc22e

test: Made tests live, which ensure the reproducibility of catboost m…

0e2253a

…odels

dannycg1996 requested a review from thinkall October 15, 2024 09:44

dannycg1996 assigned thinkall Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: FLAML catboost metrics arent reproducible #1364

fix: FLAML catboost metrics arent reproducible #1364

dannycg1996 commented Oct 15, 2024 •

edited

Loading

fix: FLAML catboost metrics arent reproducible #1364

Are you sure you want to change the base?

fix: FLAML catboost metrics arent reproducible #1364

Conversation

dannycg1996 commented Oct 15, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

dannycg1996 commented Oct 15, 2024 •

edited

Loading