[enhancement] check that all sklearnex estimators are centrally tested #2037

icfaust · 2024-09-09T07:56:23Z

Description

This adds BasicStatistics and IncrementalBasicStatistics to the SPECIAL_INSTANCES since they cannot be added easily to the patch_map. This adds a test to tests/test_common.py which will check that all estimators which inherit from sklearn's BaseEstimator without a leading underscore are in either PATCHED_MODELS or SPECIAL_INSTANCES such that they are centrally tested via sklearnex/tests

This works by monkeypatching sklearn's all_estimators which is a function sklearn uses internally to discover all estimators in sklearn. This is patched to yield all sklearn-style estimators in sklearnex without using a patch map. This required modifying all sklearn-imported estimators to follow python private variable conventions (leading underscore), which is the bulk of the changes. This is a reasonable change, since we actually would like the sklearn estimators to be private in the various sklearnex modules.

A changes was necessary in BasicStatistics in order for it to be added to SPECIAL_INSTANCES, where it currently cannot be cloned(), it required naming the options attribute to result_options to match the kwarg on __init__. (fixed in a separate PR #2038)

This fixes an issue with IncrementalBasicStatistics where validate_data is called too often for a fit call. It follows the conventions of IncrementalPCA and IncrementalEmpericalCovariance by adding a check_input kwarg boolean.

This PR has no-performance impact as it adds to testing or is renaming variables.

BasicStatistics is shown by stability testing to not be deterministic, this will be added to the documentation.

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

The unit tests pass successfully.
I have run it locally and tested the changes extensively.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.

icfaust · 2024-09-10T07:41:57Z

@icfaust could you please also include some statistics about the PR change

All in comparison to last main CI run

runner	number of tests	sklearnex runtime
github py3.9sk1.1 lnx	9086 (+101)	10 min 15 s (+9 s)
github py3.9sk1.1 win	11763 (+123)	20 min 13 s (-10 s)
github py3.10sk1.2 lnx	6309 (+71)	7 min 31 s (+8 s)
github py3.10sk1.2 win	6309 (+71)	13 min 4 s (-54 s)
github py3.11sk1.3 lnx	9154 (+101)	9 min 55 s (+26 s)
github py3.11sk1.3 win	11853 (+123)	19 min 41 s (+16 s)

This adds roughly 1% more testing, and the change in runtime is the variance in the run-to-run times.

sklearnex/tests/test_common.py

samir-nasibli · 2024-09-10T07:46:28Z

sklearnex/tests/test_common.py

+    print(estimators)
+    for name, obj in estimators:
+        # do nothing if defined in preview
+        if "preview" not in obj.__module__:


Just for my understanding why the preview is skipped?

Preview has not been centrally tested up to this point. It would also conflict with the PATCHED_MODELS, as we would need to then bookkeep for two versions of the same estimator throughout testing. Individual preview tests are discovered by pytest, but not discovered in a meaningful way by sklearnex.

We can enable preview via env variable in this case they should be centrally tested for patched models don't they?

True, luckily so far we don't do that in any of the CI systems. I guess it raises questions about what defines a preview estimator, I assumed it was because of code-quality and/or performance.

Co-authored-by: Samir Nasibli <samir.nasibli@intel.com>

icfaust · 2024-09-10T09:10:00Z

/intelci: run

icfaust · 2024-09-18T09:22:03Z

/intelci: run

icfaust · 2024-09-20T19:46:13Z

/intelci: run

samir-nasibli

Overall looks good to me! Thank you @icfaust

samir-nasibli · 2024-09-23T08:33:27Z

sklearnex/linear_model/logistic_path.py

I would suggest highlight this in squashed PR commit message this changes or just move this into separate PR

samir-nasibli · 2024-09-23T08:35:42Z

sklearnex/tests/test_common.py

+    print(estimators)
+    for name, obj in estimators:
+        # do nothing if defined in preview
+        if "preview" not in obj.__module__:


We can enable preview via env variable in this case they should be centrally tested for patched models don't they?

Alexsandruss · 2024-09-23T15:43:24Z

sklearnex/tests/test_common.py

+    estimators = all_estimators()
+    print(estimators)
+    for name, obj in estimators:
+        # do nothing if defined in preview
+        if "preview" not in obj.__module__:


Why not use estimator as test parameter and pytest.skip preview ones?

Good question, I would have to move the all_estimators monkeypatch into a fixture and then do an indirect parametrization. Objects like BaseSVM could also show up in the list. It will make sure that multiple failures will both show up, not just the first. If you want me to do it let me know. @Alexsandruss

@icfaust, yes, it makes same to do it.

Unfortunately using fixtures at collection time is something specifically not supported: pytest-dev/pytest#7140 (comment) In order to do this, I would need to manually monkeypatch instead of using the pytest monkeypatch fixture, which has implications on test isolation. What I will do is change the logic to collect all missing estimators and display them all as a single fail assert.

icfaust · 2024-09-25T12:50:46Z

/intelci: run

icfaust · 2024-10-10T08:56:46Z

/intelci: run

icfaust added 25 commits September 9, 2024 00:55

add test_all_estimators_covered

c14838b

forgotten import

2d7a988

forgotten underscore

2621aa9

remove recursion

b97a128

root -> path

978b435

forgot to change kwargs

b670841

first attempt

c01714d

wrong if statement

d84416a

add underscores

365f53a

sklearn_ -> _sklearn_

f95d07c

fix mistake

83ce015

reformulate to wrapper

679c0ff

swaps

0828e2f

forgotten :

18e95bc

isort fixes

e764e24

remove preview from search

e882951

its a set

0339188

formatting

2c41085

manually remove preview from lsit

60693cb

add BasicStatistics

d30bc01

fix mistake

e0d7f28

Update basic_statistics.py

a5f530e

Delete sklearnex/linear_model/logistic_path.py

b82b4c3

Update test_common.py

eeadc04

Update test_common.py

1837464

icfaust marked this pull request as ready for review September 10, 2024 04:49

icfaust requested review from Alexsandruss, samir-nasibli and ahuber21 as code owners September 10, 2024 04:49

icfaust removed the request for review from ahuber21 September 10, 2024 04:49

samir-nasibli reviewed Sep 10, 2024

View reviewed changes

Update sklearnex/tests/test_common.py

f2da21d

Co-authored-by: Samir Nasibli <samir.nasibli@intel.com>

icfaust requested a review from samir-nasibli September 10, 2024 08:50

Merge branch 'intel:main' into dev/centralize_finite_assert

68f3b77

Merge branch 'master' into dev/centralize_finite_assert

f27537d

icfaust added 3 commits September 18, 2024 12:45

Update test_common.py

6e1e815

Merge branch 'intel:main' into dev/centralize_finite_assert

3999db4

Merge branch 'intel:main' into dev/centralize_finite_assert

3c72cf0

samir-nasibli approved these changes Sep 23, 2024

View reviewed changes

Alexsandruss reviewed Sep 23, 2024

View reviewed changes

icfaust added 4 commits September 24, 2024 23:45

collect all uncovered estimators for assert

0a6eb24

Merge branch 'main' into dev/centralize_finite_assert

555315c

isort fixes

da232e9

add underscores

a722b4f

icfaust requested a review from samir-nasibli September 25, 2024 07:40

icfaust added 2 commits September 25, 2024 00:55

fix validate_data checks for IncBS

986ceba

Merge branch 'main' into dev/centralize_finite_assert

0305ce5

icfaust added 6 commits September 25, 2024 15:42

Update k_means.py

1f4154f

Merge branch 'main' into dev/centralize_finite_assert

085e1e1

Update test_run_to_run_stability.py

8a7bb36

Update basic_statistics.py

1a69981

Update test_run_to_run_stability.py

87e37f4

Merge branch 'main' into dev/centralize_finite_assert

cd5d6e9

icfaust merged commit 8883b39 into uxlfoundation:main Oct 10, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement] check that all sklearnex estimators are centrally tested #2037

[enhancement] check that all sklearnex estimators are centrally tested #2037

icfaust commented Sep 9, 2024 •

edited

Loading

icfaust commented Sep 10, 2024 •

edited

Loading

samir-nasibli Sep 10, 2024

icfaust Sep 10, 2024

samir-nasibli Sep 23, 2024

icfaust Sep 24, 2024

icfaust commented Sep 10, 2024

icfaust commented Sep 18, 2024

icfaust commented Sep 20, 2024

samir-nasibli left a comment

samir-nasibli Sep 23, 2024

samir-nasibli Sep 23, 2024

Alexsandruss Sep 23, 2024

icfaust Sep 24, 2024

Alexsandruss Sep 24, 2024

icfaust Sep 25, 2024

icfaust commented Sep 25, 2024

icfaust commented Oct 10, 2024

[enhancement] check that all sklearnex estimators are centrally tested #2037

[enhancement] check that all sklearnex estimators are centrally tested #2037

Conversation

icfaust commented Sep 9, 2024 • edited Loading

Description

icfaust commented Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icfaust commented Sep 10, 2024

icfaust commented Sep 18, 2024

icfaust commented Sep 20, 2024

samir-nasibli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icfaust commented Sep 25, 2024

icfaust commented Oct 10, 2024

icfaust commented Sep 9, 2024 •

edited

Loading

icfaust commented Sep 10, 2024 •

edited

Loading