Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: fix tests #725

Merged
merged 6 commits into from
Jul 19, 2021
Merged

Conversation

agorshk
Copy link
Contributor

@agorshk agorshk commented Jul 6, 2021

No description provided.

dtype=[np.float64, np.float32])
if not hasattr(self, 'daal_model_') or \
sp.issparse(X) or self.n_outputs_ != 1 or \
not daal_check_version((2021, 'P', 200)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
not daal_check_version((2021, 'P', 200)):

(f'X has {X.shape[1]} features, '
f'but RandomForestClassifier is expecting '
f'{self.n_features_in_} features as input'))
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this after check for use daal, otherwise will do unnecessary checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -36,4 +36,4 @@ def test_sklearnex_import_rf_regression():
random_state=0, shuffle=False)
rf = RandomForestRegressor(max_depth=2, random_state=0).fit(X, y)
assert 'daal4py' in rf.__module__
assert_allclose([-6.66], rf.predict([[0, 0, 0, 0]]), atol=1e-2)
assert_allclose([-6.97], rf.predict([[0, 0, 0, 0]]), atol=1e-2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the value in stock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In stock Sklearn it's near -8,32; new implementation is close

@@ -31,9 +31,9 @@
import RandomForestRegressor as DaalRandomForestRegressor
from daal4py.sklearn._utils import daal_check_version

ACCURACY_RATIO = 0.85
ACCURACY_RATIO = 0.95 if daal_check_version((2021, 'P', 400)) else 0.85
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand correctly, the accuracy has bad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, accuracy and log loss in new implementation is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I checked MSE in the regression example(prediction saves in file decision_forest_regression_batch.csv), it used to ~17,65; in new implementation it's ~17,26. It means that error is decreased.

@agorshk agorshk requested a review from PetrovKP July 7, 2021 07:02
@agorshk
Copy link
Contributor Author

agorshk commented Jul 7, 2021

CI might be bound from this PR We don't have to run CI twice

@uxlfoundation uxlfoundation deleted a comment from OnlyDeniko Jul 7, 2021
@PetrovKP PetrovKP added the sklearn-patch sklearn patching label Jul 8, 2021
@agorshk agorshk force-pushed the dev/agorshk-forest_fix_tests branch from 0f707b3 to c2fde1e Compare July 15, 2021 06:22
@agorshk agorshk requested a review from michael-smirnov July 15, 2021 06:26
Comment on lines 242 to 260
# predict_proba
- ensemble/tests/test_forest.py::test_probability[RandomForestClassifier]
- ensemble/tests/test_forest.py::test_parallel_train
- ensemble/tests/test_stacking.py::test_stacking_classifier_iris
- ensemble/tests/test_stacking.py::test_stacking_classifier_drop_column_binary_classification
- ensemble/tests/test_stacking.py::test_stacking_classifier_drop_estimator
- ensemble/tests/test_stacking.py::test_stacking_classifier_drop_binary_prob
- ensemble/tests/test_voting.py::test_weights_iris
- ensemble/tests/test_voting.py::test_predict_on_toy_problem
- ensemble/tests/test_voting.py::test_predict_proba_on_toy_problem
- ensemble/tests/test_voting.py::test_gridsearch
- ensemble/tests/test_voting.py::test_voting_classifier_set_params
- ensemble/tests/test_voting.py::test_set_estimator_drop
- ensemble/tests/test_voting.py::test_estimator_weights_format
- ensemble/tests/test_voting.py::test_transform
- inspection/tests/test_permutation_importance.py::test_permutation_importance_equivalence_sequential_parallel
- tests/test_calibration.py
- tests/test_common.py::test_estimators
- tests/test_multioutput.py::test_multi_output_classification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this tests fail just because of predict_proba usage?
Is it clear what is broken in oneDAL?
I'm afraid that we broke big amount of tests that were passing previously.

@agorshk
Copy link
Contributor Author

agorshk commented Jul 16, 2021

@Mergifyio rebase

@agorshk agorshk changed the title RF: fix tests, enabling predict_proba RF: fix tests Jul 19, 2021
@agorshk agorshk merged commit a17b5a7 into uxlfoundation:master Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sklearn-patch sklearn patching
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants