RF: fix tests #725

agorshk · 2021-07-06T16:37:06Z

No description provided.

PetrovKP · 2021-07-06T19:12:04Z

daal4py/sklearn/ensemble/_forest.py

+                        dtype=[np.float64, np.float32])
+        if not hasattr(self, 'daal_model_') or \
+                sp.issparse(X) or self.n_outputs_ != 1 or \
+                not daal_check_version((2021, 'P', 200)):


Suggested change

not daal_check_version((2021, 'P', 200)):

PetrovKP · 2021-07-06T19:13:03Z

daal4py/sklearn/ensemble/_forest.py

-                    (f'X has {X.shape[1]} features, '
-                     f'but RandomForestClassifier is expecting '
-                     f'{self.n_features_in_} features as input'))
+        X = check_array(X, accept_sparse=['csr', 'csc', 'coo'],


move this after check for use daal, otherwise will do unnecessary checks

PetrovKP · 2021-07-06T20:22:31Z

sklearnex/ensemble/tests/test_random_forest.py

@@ -36,4 +36,4 @@ def test_sklearnex_import_rf_regression():
                           random_state=0, shuffle=False)
    rf = RandomForestRegressor(max_depth=2, random_state=0).fit(X, y)
    assert 'daal4py' in rf.__module__
-    assert_allclose([-6.66], rf.predict([[0, 0, 0, 0]]), atol=1e-2)
+    assert_allclose([-6.97], rf.predict([[0, 0, 0, 0]]), atol=1e-2)


what is the value in stock?

In stock Sklearn it's near -8,32; new implementation is close

PetrovKP · 2021-07-06T20:23:49Z

daal4py/sklearn/ensemble/tests/test_decision_forest.py

@@ -31,9 +31,9 @@
    import RandomForestRegressor as DaalRandomForestRegressor
 from daal4py.sklearn._utils import daal_check_version

-ACCURACY_RATIO = 0.85
+ACCURACY_RATIO = 0.95 if daal_check_version((2021, 'P', 400)) else 0.85


I understand correctly, the accuracy has bad?

No, accuracy and log loss in new implementation is better

Also I checked MSE in the regression example(prediction saves in file decision_forest_regression_batch.csv), it used to ~17,65; in new implementation it's ~17,26. It means that error is decreased.

agorshk · 2021-07-07T08:38:50Z

CI might be bound from this PR We don't have to run CI twice

michael-smirnov · 2021-07-16T06:05:44Z

deselected_tests.yaml

+  # predict_proba
+  - ensemble/tests/test_forest.py::test_probability[RandomForestClassifier]
+  - ensemble/tests/test_forest.py::test_parallel_train
+  - ensemble/tests/test_stacking.py::test_stacking_classifier_iris
+  - ensemble/tests/test_stacking.py::test_stacking_classifier_drop_column_binary_classification
+  - ensemble/tests/test_stacking.py::test_stacking_classifier_drop_estimator
+  - ensemble/tests/test_stacking.py::test_stacking_classifier_drop_binary_prob
+  - ensemble/tests/test_voting.py::test_weights_iris
+  - ensemble/tests/test_voting.py::test_predict_on_toy_problem
+  - ensemble/tests/test_voting.py::test_predict_proba_on_toy_problem
+  - ensemble/tests/test_voting.py::test_gridsearch
+  - ensemble/tests/test_voting.py::test_voting_classifier_set_params
+  - ensemble/tests/test_voting.py::test_set_estimator_drop
+  - ensemble/tests/test_voting.py::test_estimator_weights_format
+  - ensemble/tests/test_voting.py::test_transform
+  - inspection/tests/test_permutation_importance.py::test_permutation_importance_equivalence_sequential_parallel
+  - tests/test_calibration.py
+  - tests/test_common.py::test_estimators
+  - tests/test_multioutput.py::test_multi_output_classification


Are this tests fail just because of predict_proba usage?
Is it clear what is broken in oneDAL?
I'm afraid that we broke big amount of tests that were passing previously.

agorshk · 2021-07-16T10:30:57Z

@Mergifyio rebase

agorshk requested review from OnlyDeniko, Pahandrovich and PetrovKP as code owners July 6, 2021 16:37

PetrovKP reviewed Jul 6, 2021

View reviewed changes

agorshk requested a review from PetrovKP July 7, 2021 07:02

uxlfoundation deleted a comment from OnlyDeniko Jul 7, 2021

PetrovKP added the sklearn-patch sklearn patching label Jul 8, 2021

agorshk added 5 commits July 13, 2021 16:56

RF: fix tests, enabling predict_proba

4600eef

RF: fix versioning

20597b8

RF: add test to deselected list, result depends of random state

2b4eec8

RF: fix for sklearn v0.22

30d58f3

Add deselected tests in gpu part

c2fde1e

agorshk force-pushed the dev/agorshk-forest_fix_tests branch from 0f707b3 to c2fde1e Compare July 15, 2021 06:22

agorshk requested a review from michael-smirnov July 15, 2021 06:26

PetrovKP approved these changes Jul 15, 2021

View reviewed changes

michael-smirnov suggested changes Jul 16, 2021

View reviewed changes

Revert enabling predict_proba due to gpu problems

26d40cd

michael-smirnov approved these changes Jul 16, 2021

View reviewed changes

agorshk changed the title ~~RF: fix tests, enabling predict_proba~~ RF: fix tests Jul 19, 2021

agorshk merged commit a17b5a7 into uxlfoundation:master Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RF: fix tests #725

RF: fix tests #725

agorshk commented Jul 6, 2021

PetrovKP Jul 6, 2021

PetrovKP Jul 6, 2021

agorshk Jul 6, 2021

PetrovKP Jul 6, 2021

agorshk Jul 6, 2021

PetrovKP Jul 6, 2021

agorshk Jul 6, 2021

agorshk Jul 6, 2021

agorshk commented Jul 7, 2021

michael-smirnov Jul 16, 2021

agorshk commented Jul 16, 2021

RF: fix tests #725

RF: fix tests #725

Conversation

agorshk commented Jul 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agorshk commented Jul 7, 2021

Choose a reason for hiding this comment

agorshk commented Jul 16, 2021