-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Review] Precision recall curve using cupy #2519
[Review] Precision recall curve using cupy #2519
Conversation
sync with upstream
merge with upstream
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments on a first look
ids = cp.argsort(-y_score) | ||
sorted_score = y_score[ids] | ||
|
||
ones = y_true[ids].astype('float32') # for calculating true positives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why float32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to convert it to float
since later on it is used in the RawKernel which expects float
data type. float64
should also be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think float then is probably fine. Technically we could use
cuml/python/cuml/common/kernel_utils.py
Line 55 in 489a7d8
def cuda_kernel_factory(nvrtc_kernel_str, dtypes, kernel_name=None): |
float
shouldn't pose any problems, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried several variations and it is really surprising:
# trial 1
ones = y_true[ids]
# trial 2
ones = y_true[ids].astype('float')
#trial 3
ones = y_true[ids].astype('float64')
All of them have assertion error due to mismatched results against sklearn.
Only ones = y_true[ids].astype('float32')
works. Please note that in each trial I not only changed this line but every line with astype('float32')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantegd Please let me know if this is acceptable. And if there are other changes to be made. Thank you!
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
rerun tests |
@daxiongshu could you solve the conflicts? The PR is very close, I think it looks good so far! |
@dantegd merge conflicts fixed. Thank you! |
In this PR, I will refactor
_ranking.py
so that basic functions can be reused among existing and upcoming metrics.