-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[MRG] updates docstrings and user guide for ENN, RENN and AllKNN #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1,4 +1,4 @@ | |||
"""Class to perform under-sampling based on the edited nearest neighbour | |||
"""Classes to perform under-sampling based on the edited nearest neighbour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs and API use both the form of the word neighbo(u)r with and without u. Sklearn uses neighbor without u in its classes names, imbalanced learn uses neighbour with u in the classes names and without u in the parameters and attributes. The docstrings had a bit of both.
I unified the docstrings to Neighbors without u.
At some point it might be good considering unifying also the class name .
will stop when i) the maximum number of iterations is reached, or ii) no | ||
more observations are being removed, or iii) one of the majority classes | ||
becomes a minority class or iv) one of the majority classes disappears | ||
from the target after undersampling. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a way to understanding how the algo stops, unless we read the source code. So I added this bit.
# (that is, if no further observations are removed) | ||
# 2. If the number of samples in any of the other (majority) classes becomes | ||
# smaller than the number of samples in the minority class | ||
# 3. If one of the classes disappears |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble understanding the comments and the logic so I rephrased a bit
The repetitions will stop when i) the maximum number of iterations | ||
is reached, or ii) one of the majority classes becomes a minority | ||
class or iii) one of the majority classes disappears from the target | ||
after undersampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helps user understand when repetitions stop, without looking at source code
Codecov Report
@@ Coverage Diff @@
## master #850 +/- ##
=======================================
Coverage 98.69% 98.69%
=======================================
Files 93 93
Lines 6063 6063
Branches 508 508
=======================================
Hits 5984 5984
Misses 78 78
Partials 1 1
Continue to review full report at Codecov.
|
@glemaitre I think this is ready for review |
@glemaitre would you like me to reopen and update this PR? |
I could find some time to review. However, it would be easier for me to have 1 PR by samplers. Could you split the PR @solegalli |
Reference Issue
#851
What does this implement/fix? Explain your changes.
Expands docstrings and user guide for ENN, RENN and AllKNN.
Any other comments?
I had some trouble understanding the implementation of the algorithms. In short, I thought that to explore the 3 closest neighbours a 3-KNN should be trained. As it turns out, it needs to be a 4 KNN. I also had trouble understanding when the
RENN and AllKNN stop and why. And finally, the parameter n_neighbours in ENN and RENN does not elicit the same behaviour as in AllKNN, so I think tit helps to clarify.