-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- create a gpu metrics (internal) registry #5387
Conversation
- if this approach is palatable, this can be rolled to other areas as well - the objective is to separate the cpu and gpu implementations such that they evolve indepedently. to that end, this approach will: - preserve the same metrics configuration (from the end user perspective) - internally delegate the responsibility to the gpu metrics builder when there is a valid device present - decouple the gpu metrics builder from the cpu ones to prevent misuse - move away from including the cuda file from within the cc file and segregate the code via ifdef's nest, i'll create a pr for the metric implementation on the gpu
Do you have a concrete use case in mind? |
Codecov Report
@@ Coverage Diff @@
## master #5387 +/- ##
==========================================
+ Coverage 84.07% 84.19% +0.12%
==========================================
Files 11 11
Lines 2411 2411
==========================================
+ Hits 2027 2030 +3
+ Misses 384 381 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner and I'm wondering if we should do the same here. The learner holds most information about the learning task and configuration and might be better suited to dispatching the correct algorithm.
Another disadvantage of this approach is having to define duplicated code in each CPU metric for dispatching GPU algorithms.
So as an alternative I would propose separating the objectives into distinct GPU/CPU implementations as you have done, but selecting them inside the learner. @trivialfis WDYT?
[sc] the reason i did this here is for the specific component - metric/objective etc. to choose independently if it needs to be accelerated on the gpu, as opposed to having the learner choose it for them. not all metrics are accelerated on the gpu and it was best left for that metric to decide independently.
[sc] further, even within a given metric, we only move the computation on the gpu only when a certain condition is met. this also has to be decided independently on a metric-by-metric (or by a component) basis by that component. punting all this to the learner could increase coupling significantly. |
Possibly another way to do this without the registry would be to create metrics.h metrics.cc metrics.cu, put the |
[sc] some of the metrics use the template method pattern that drives the logic in one place, and deferring part of functionality to the sub classes or other meta types (to prevent virtual on device). this can get hairy with this approach. the gpu registry approach keeps it fairly isolated and can easily delegate responsibility to the gpu classes as long as they are registered with the same name as their cpu counterparts (which is its only requirement). what is the main objection to this approach (other than that it is non traditional)? |
Please don't. The Learner is already very complicated as it's. Just out of curious, why do we need a class to carry out computing metric? Looking through the code there's no class states. Looking at #5326 , the whole class is just a function. |
[sc] the metric class has state and there is ranking config (parsed out of the metric name). these can evolve, and if we made them as free functions, we would have to pass them all to every metric function that is supported on the gpu. hence, they do have some state. further, upon looking closely, i find that this pattern isn't without precedent. it looks like page formats, tree io etc. all leverage the registry mechanics to cleanly delegate responsibility. |
…om within a process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok to me. We will want to make sure binary classification is supported for AUC in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the approach. Now we need to be sure there's no problem during changing between CPU and GPU. Please be aware of that, this can happen after serialization(like pickling) and unserialization on another worker.
valid device present
via ifdef's
next, i'll create a pr for the ranking metric implementation on the gpu
@RAMitchell @trivialfis - please review