Make sure X and y are sorted equally in select_features #715
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This looks like a quite large bug to me.
In principle, X and y could have the same index entries (what we check), but we never check if the order is the same! So passing in a
X
with index[0, 1, 2]
and ay
with index[2, 1, 0]
will not raise any problems.For classification tasks, this is not a problem, as all the significance tests except
target_real_feature_real_test
do not care about the order of the index, asx
andy
are re-ordered internally again in all the other tasks. But fortarget_real_feature_real_test
the index is stripped before any re-order could happen - which leads to the bug described in #713.This PR fixes this. I am a bit puzzled why this was never a problem before :-/
Therefore I added the exact test from the issue, to see any degradation again.
@MaxBenChrist as a general remark: we should add more tests on
select_features
:-)