test: Adding tests to verify model reproducibility #1362
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
See associated issue #1361. There's a few issues around model reproducibility, and these tests will help identify where those issues are. I have commented out models which fail in the new tests, and these show where the reproducibility issues lie - largely around CatBoost, LGBM and Logistic Regression models.
I have added four separate tests in total, all of which aim to verify that we can reproduce the validation loss FLAML gives us, using the model which FLAML provides us with. The 4 tests deal with the following cases:
Tests 1 and 3, which deal with the FLAMLised model, are fairly self-explanatory (I hope). For the other tests (2 and 4) I had to define my own method on
conftest.py
,evaluate_cv_folds_with_underlying_model
, which aims to mimic the FLAML cross-validation process. I couldn't directly call ongeneric_task.evaluate_model_CV()
here, as that method doesn't work with non-FLAMLised estimators. The lineestimator.cleanup()
raises an error, for example.Thanks for any feedback -please let me know of any changes which might be needed
Related issue number
Closes #1361
Checks