-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] check that all sklearnex estimators are centrally tested #2037
[enhancement] check that all sklearnex estimators are centrally tested #2037
Conversation
All in comparison to last main CI run
This adds roughly 1% more testing, and the change in runtime is the variance in the run-to-run times. |
sklearnex/tests/test_common.py
Outdated
print(estimators) | ||
for name, obj in estimators: | ||
# do nothing if defined in preview | ||
if "preview" not in obj.__module__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding why the preview is skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preview has not been centrally tested up to this point. It would also conflict with the PATCHED_MODELS, as we would need to then bookkeep for two versions of the same estimator throughout testing. Individual preview tests are discovered by pytest, but not discovered in a meaningful way by sklearnex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can enable preview via env variable in this case they should be centrally tested for patched models don't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, luckily so far we don't do that in any of the CI systems. I guess it raises questions about what defines a preview estimator, I assumed it was because of code-quality and/or performance.
Co-authored-by: Samir Nasibli <samir.nasibli@intel.com>
/intelci: run |
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me! Thank you @icfaust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest highlight this in squashed PR commit message this changes or just move this into separate PR
sklearnex/tests/test_common.py
Outdated
print(estimators) | ||
for name, obj in estimators: | ||
# do nothing if defined in preview | ||
if "preview" not in obj.__module__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can enable preview via env variable in this case they should be centrally tested for patched models don't they?
sklearnex/tests/test_common.py
Outdated
estimators = all_estimators() | ||
print(estimators) | ||
for name, obj in estimators: | ||
# do nothing if defined in preview | ||
if "preview" not in obj.__module__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use estimator as test parameter and pytest.skip preview ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I would have to move the all_estimators monkeypatch into a fixture and then do an indirect parametrization. Objects like BaseSVM could also show up in the list. It will make sure that multiple failures will both show up, not just the first. If you want me to do it let me know. @Alexsandruss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icfaust, yes, it makes same to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately using fixtures at collection time is something specifically not supported: pytest-dev/pytest#7140 (comment) In order to do this, I would need to manually monkeypatch instead of using the pytest monkeypatch fixture, which has implications on test isolation. What I will do is change the logic to collect all missing estimators and display them all as a single fail assert.
/intelci: run |
/intelci: run |
Description
This adds
BasicStatistics
andIncrementalBasicStatistics
to theSPECIAL_INSTANCES
since they cannot be added easily to the patch_map. This adds a test totests/test_common.py
which will check that all estimators which inherit from sklearn'sBaseEstimator
without a leading underscore are in eitherPATCHED_MODELS
orSPECIAL_INSTANCES
such that they are centrally tested viasklearnex/tests
This works by monkeypatching sklearn's
all_estimators
which is a function sklearn uses internally to discover all estimators in sklearn. This is patched to yield all sklearn-style estimators in sklearnex without using a patch map. This required modifying all sklearn-imported estimators to follow python private variable conventions (leading underscore), which is the bulk of the changes. This is a reasonable change, since we actually would like the sklearn estimators to be private in the various sklearnex modules.A changes was necessary in
BasicStatistics
in order for it to be added toSPECIAL_INSTANCES
, where it currently cannot becloned()
, it required naming theoptions
attribute toresult_options
to match the kwarg on__init__
. (fixed in a separate PR #2038)This fixes an issue with
IncrementalBasicStatistics
where validate_data is called too often for a fit call. It follows the conventions ofIncrementalPCA
andIncrementalEmpericalCovariance
by adding a check_input kwarg boolean.This PR has no-performance impact as it adds to testing or is renaming variables.
BasicStatistics
is shown by stability testing to not be deterministic, this will be added to the documentation.Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance