test: Adding tests to verify model reproducibility #1362

dannycg1996 · 2024-10-09T20:04:33Z

Why are these changes needed?

See associated issue #1361. There's a few issues around model reproducibility, and these tests will help identify where those issues are. I have commented out models which fail in the new tests, and these show where the reproducibility issues lie - largely around CatBoost, LGBM and Logistic Regression models.

I have added four separate tests in total, all of which aim to verify that we can reproduce the validation loss FLAML gives us, using the model which FLAML provides us with. The 4 tests deal with the following cases:

Reproduce the FLAML result, for the classification case, using the FLAMLised model.
Reproduce the FLAML result, for the classification case, using the underlying SKLearn/xgboost/catboost model.
Reproduce the FLAML result, for the regression case, using the FLAMLised model.
Reproduce the FLAML result, for the regression case, using the underlying SKLearn/xgboost/catboost model.

Tests 1 and 3, which deal with the FLAMLised model, are fairly self-explanatory (I hope). For the other tests (2 and 4) I had to define my own method on conftest.py, evaluate_cv_folds_with_underlying_model, which aims to mimic the FLAML cross-validation process. I couldn't directly call on generic_task.evaluate_model_CV() here, as that method doesn't work with non-FLAMLised estimators. The line estimator.cleanup() raises an error, for example.

Thanks for any feedback -please let me know of any changes which might be needed

Related issue number

Closes #1361

Checks

I've used pre-commit to lint the changes in this PR (note the same in integrated in our CI checks).
I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…ification tasks

…ion tasks

…lying models for classification and regression tasks

dannycg1996 · 2024-10-09T20:06:21Z

@microsoft-github-policy-service agree company="Evotec"

thinkall

Thank you so much, @dannycg1996 ! LGTM.

Daniel Grindrod added 3 commits October 9, 2024 16:48

feat: Implemented test to check if we can reproduce results for class…

6f64379

…ification tasks

feat: Implemented test to verify we can reproduce results for regress…

13f79fb

…ion tasks

feat: Implemented test to verify we can reproduce results using under…

9cfa427

…lying models for classification and regression tasks

dannycg1996 requested a review from thinkall October 9, 2024 20:04

refactor: fix linter

9b69494

thinkall approved these changes Oct 11, 2024

View reviewed changes

thinkall merged commit a2a5e1a into microsoft:main Oct 12, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Adding tests to verify model reproducibility #1362

test: Adding tests to verify model reproducibility #1362

dannycg1996 commented Oct 9, 2024 •

edited

Loading

dannycg1996 commented Oct 9, 2024

thinkall left a comment

test: Adding tests to verify model reproducibility #1362

test: Adding tests to verify model reproducibility #1362

Conversation

dannycg1996 commented Oct 9, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

dannycg1996 commented Oct 9, 2024

thinkall left a comment

Choose a reason for hiding this comment

dannycg1996 commented Oct 9, 2024 •

edited

Loading