Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: support filtering of const features #1725

Merged

Conversation

agorshk
Copy link
Contributor

@agorshk agorshk commented Jun 15, 2021

Accuracy/MSE and performance were measured and compared on some datasets. Accuracy score was compared on autogluon dataset.
1
2

@agorshk agorshk force-pushed the dev/agorshk-forest_feature_filtering branch 2 times, most recently from dadb000 to 252980c Compare June 16, 2021 12:00
@agorshk
Copy link
Contributor Author

agorshk commented Jun 17, 2021

/intelci: run

@agorshk
Copy link
Contributor Author

agorshk commented Jun 21, 2021

/intelci: restart

@agorshk
Copy link
Contributor Author

agorshk commented Jun 21, 2021

/intelci: run

@agorshk agorshk marked this pull request as ready for review June 21, 2021 20:58
@PetrovKP PetrovKP added the bug label Jun 23, 2021
Copy link
Contributor

@PetrovKP PetrovKP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have failed for daal4py examples for random forest in private and public CI

@@ -492,7 +492,8 @@ protected:
_minSamplesSplit(2),
_minWeightLeaf(0.),
_minImpurityDecrease(-daal::services::internal::EpsilonVal<algorithmFPType>::get() * x->getNumberOfRows()),
_maxLeafNodes(0)
_maxLeafNodes(0),
_useConstFeatures(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this variable? you change it nowhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to save this variable, it might be dissuaded to use previous approach in some cases. This variable doesn't changed in any place in this PR.

_aConstFeatureIdx.reset(maxFeatures * 2); // first maxFeatures elements are used for saving indices of constant features,
// the other part are used for saving levels of this features
DAAL_CHECK_MALLOC(_aConstFeatureIdx.get());
PRAGMA_IVDEP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use memset for _aConstFeatureIdx

services::service_memset_seq<IndexType, cpu>(_aConstFeatureIdx, 0, maxFeatures * 2);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, thanks, it was fixed

if (_hostApp.isCancelled(s, n)) return nullptr;

if (!_par.memorySavingMode && !_useConstFeatures)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand correctly, this code will work for every tree? Why can't we count at the beginning - here, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trick of checking "non-const" features effectively works only for "depth-first" build trees strategy. In this case it was put here.

@agorshk agorshk force-pushed the dev/agorshk-forest_feature_filtering branch from 252980c to 23f8c80 Compare July 5, 2021 07:47
@agorshk agorshk requested a review from PetrovKP July 5, 2021 19:59
@agorshk
Copy link
Contributor Author

agorshk commented Jul 16, 2021

/intelci: restart
set daal4py_repo=https://github.com/agorshk/daal4py.git
set daal4py_branch=dev/agorshk-forest_fix_tests

@agorshk
Copy link
Contributor Author

agorshk commented Jul 16, 2021

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Jul 16, 2021

Command rebase: success

Branch has been successfully rebased

@napetrov napetrov force-pushed the dev/agorshk-forest_feature_filtering branch from 23f8c80 to d6c0661 Compare July 16, 2021 10:29
@agorshk
Copy link
Contributor Author

agorshk commented Jul 16, 2021

/intelci: run
set daal4py_repo=https://github.com/agorshk/daal4py.git
set daal4py_branch=dev/agorshk-forest_fix_tests

@agorshk
Copy link
Contributor Author

agorshk commented Jul 18, 2021

@Mergifyio rebase

@agorshk
Copy link
Contributor Author

agorshk commented Jul 18, 2021

/intelci: restart
set daal4py_repo=https://github.com/agorshk/daal4py.git
set daal4py_branch=dev/agorshk-forest_fix_tests

@mergify
Copy link
Contributor

mergify bot commented Jul 18, 2021

Command rebase: success

Branch already up to date

@agorshk agorshk merged commit 4a8c22e into uxlfoundation:master Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants