fix: FLAML catboost metrics arent reproducible #1364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: Please don't merge without a discussion first
Why are these changes needed?
Currently it is impossible to reproduce the best_loss provided by FLAML, using the best model which FLAML provides - when that best model is a CatBoost model.
There are two important changes made in this PR:
self.params[self.ITER_HP] = self._model.tree_count_
line onmodel.py
model.py
:if current_time + self._time_per_iter > self.deadline: return False
The line highlighted in 1) is the cause of issue #1317. As I understand it, CatBoost estimators can be thought of as performing their own AutoML/optimisation process internally - they have early stopping, and when
use_best_model = True
they internally optimise the objective metric. The issue with the current code is that it overwrites the n_estimators value with the actual number of trees used, but doesn't switch off this AutoML functionality. Rather than providing the user with the correct model, we simply change the internal AutoML process. As such, we get different results when we retrain and test that model on the same folds.My main issue with 2) is that the
XGBoostEstimator
also utilises this samecheck_resource_limits()
, so this change could have unintended consequences. On a similar note, I've realised thatXGBoostEstimator
andCatBoostEstimators
callcheck_resource_limits
differently i.e.:Could you please explain why XGBoostResourceLimit returns
return not self.check_resource_limits(now, epoch, "xgb")
whilst CatBoostResourceLimit returnsself.check_resource_limits(now, info.iteration, "cat")
please?It's the
not
part which I'm interested in.Also worth mentioning is that we still assign
self._time_per_iter
incheck_resource_limits
. Can this code also be deleted, or is it still of use somewhere?I'm fairly confident now that CatBoost results are now reproducible, but please let me know if you have any issues with my code.
Related issue number
Closes #1317
Checks