Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Adding tests to verify model reproducibility #1362

Merged
merged 4 commits into from
Oct 12, 2024

Conversation

dannycg1996
Copy link
Collaborator

@dannycg1996 dannycg1996 commented Oct 9, 2024

Why are these changes needed?

See associated issue #1361. There's a few issues around model reproducibility, and these tests will help identify where those issues are. I have commented out models which fail in the new tests, and these show where the reproducibility issues lie - largely around CatBoost, LGBM and Logistic Regression models.

I have added four separate tests in total, all of which aim to verify that we can reproduce the validation loss FLAML gives us, using the model which FLAML provides us with. The 4 tests deal with the following cases:

  1. Reproduce the FLAML result, for the classification case, using the FLAMLised model.
  2. Reproduce the FLAML result, for the classification case, using the underlying SKLearn/xgboost/catboost model.
  3. Reproduce the FLAML result, for the regression case, using the FLAMLised model.
  4. Reproduce the FLAML result, for the regression case, using the underlying SKLearn/xgboost/catboost model.

Tests 1 and 3, which deal with the FLAMLised model, are fairly self-explanatory (I hope). For the other tests (2 and 4) I had to define my own method on conftest.py, evaluate_cv_folds_with_underlying_model, which aims to mimic the FLAML cross-validation process. I couldn't directly call on generic_task.evaluate_model_CV() here, as that method doesn't work with non-FLAMLised estimators. The line estimator.cleanup() raises an error, for example.

Thanks for any feedback -please let me know of any changes which might be needed

Related issue number

Closes #1361

Checks

@dannycg1996
Copy link
Collaborator Author

@microsoft-github-policy-service agree company="Evotec"

Copy link
Collaborator

@thinkall thinkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @dannycg1996 ! LGTM.

@thinkall thinkall merged commit a2a5e1a into microsoft:main Oct 12, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Issue]: Lack of Testing Around Reproducibility of Results of Classification and Regression Models
2 participants