- create a gpu metrics (internal) registry #5387

sriramch · 2020-03-03T19:03:23Z

if this approach is palatable, this can be rolled to other areas as well
the objective is to separate the cpu and gpu implementations (where they digress significantly) such that they evolve independently. to that end, this approach will:
- preserve the same metrics configuration (from the end user perspective)
- internally delegate the responsibility to the gpu metrics builder when there is a
  valid device present
- decouple the gpu metrics builder from the cpu ones to prevent misuse (thus not impacting other areas of the code). the locality of gpu reference is strictly within the metric builders registered with the metric registry
- move away from including the cuda file from within the cc file and segregate the code
  via ifdef's

next, i'll create a pr for the ranking metric implementation on the gpu

- if this approach is palatable, this can be rolled to other areas as well - the objective is to separate the cpu and gpu implementations such that they evolve indepedently. to that end, this approach will: - preserve the same metrics configuration (from the end user perspective) - internally delegate the responsibility to the gpu metrics builder when there is a valid device present - decouple the gpu metrics builder from the cpu ones to prevent misuse - move away from including the cuda file from within the cc file and segregate the code via ifdef's nest, i'll create a pr for the metric implementation on the gpu

hcho3 · 2020-03-03T20:50:19Z

the objective is to separate the cpu and gpu implementations (where they digress significantly) such that they evolve independently

Do you have a concrete use case in mind?
@sriramch

mli · 2020-03-03T21:37:41Z

Codecov Report

Merging #5387 into master will increase coverage by 0.12%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5387      +/-   ##
==========================================
+ Coverage   84.07%   84.19%   +0.12%     
==========================================
  Files          11       11              
  Lines        2411     2411              
==========================================
+ Hits         2027     2030       +3     
+ Misses        384      381       -3

Impacted Files	Coverage Δ
python-package/xgboost/tracker.py	`95.18% <0.00%> (+1.20%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dc8e89...f522124. Read the comment docs.

RAMitchell

In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner and I'm wondering if we should do the same here. The learner holds most information about the learning task and configuration and might be better suited to dispatching the correct algorithm.

Another disadvantage of this approach is having to define duplicated code in each CPU metric for dispatching GPU algorithms.

So as an alternative I would propose separating the objectives into distinct GPU/CPU implementations as you have done, but selecting them inside the learner. @trivialfis WDYT?

sriramch · 2020-03-03T23:49:18Z

the objective is to separate the cpu and gpu implementations (where they digress significantly) such that they evolve independently

Do you have a concrete use case in mind?
@sriramch

@hcho3 the metrics and the objective ranking implementations are the ones i'd in mind for this.

sriramch · 2020-03-03T23:52:06Z

In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner and I'm wondering if we should do the same here. The learner holds most information about the learning task and configuration and might be better suited to dispatching the correct algorithm.

[sc] the reason i did this here is for the specific component - metric/objective etc. to choose independently if it needs to be accelerated on the gpu, as opposed to having the learner choose it for them. not all metrics are accelerated on the gpu and it was best left for that metric to decide independently.

Another disadvantage of this approach is having to define duplicated code in each CPU metric for dispatching GPU algorithms.

[sc] further, even within a given metric, we only move the computation on the gpu only when a certain condition is met. this also has to be decided independently on a metric-by-metric (or by a component) basis by that component. punting all this to the learner could increase coupling significantly.

RAMitchell · 2020-03-04T00:30:02Z

Possibly another way to do this without the registry would be to create metrics.h metrics.cc metrics.cu, put the struct EvalAuc : public Metric declaration in the header and define methods for GPU/CPU computation, with implementations in respective cc/cu files. I guess this only works when the class has no GPU only member variables.

sriramch · 2020-03-04T05:33:24Z

Possibly another way to do this without the registry would be to create metrics.h metrics.cc metrics.cu, put the struct EvalAuc : public Metric declaration in the header and define methods for GPU/CPU computation, with implementations in respective cc/cu files. I guess this only works when the class has no GPU only member variables.

[sc] some of the metrics use the template method pattern that drives the logic in one place, and deferring part of functionality to the sub classes or other meta types (to prevent virtual on device). this can get hairy with this approach.

the gpu registry approach keeps it fairly isolated and can easily delegate responsibility to the gpu classes as long as they are registered with the same name as their cpu counterparts (which is its only requirement). what is the main objection to this approach (other than that it is non traditional)?

trivialfis · 2020-03-04T06:54:04Z

In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner

Please don't. The Learner is already very complicated as it's.

Just out of curious, why do we need a class to carry out computing metric? Looking through the code there's no class states. Looking at #5326 , the whole class is just a function.

…metric_reg

sriramch · 2020-03-04T16:07:58Z

Just out of curious, why do we need a class to carry out computing metric? Looking through the code there's no class states. Looking at #5326 , the whole class is just a function.

[sc] the metric class has state and there is ranking config (parsed out of the metric name). these can evolve, and if we made them as free functions, we would have to pass them all to every metric function that is supported on the gpu. hence, they do have some state.

further, upon looking closely, i find that this pattern isn't without precedent. it looks like page formats, tree io etc. all leverage the registry mechanics to cleanly delegate responsibility.

…om within a process

RAMitchell

Seems ok to me. We will want to make sure binary classification is supported for AUC in the next PR.

trivialfis

Approving the approach. Now we need to be sure there's no problem during changing between CPU and GPU. Please be aware of that, this can happen after serialization(like pickling) and unserialization on another worker.

- this is the last part of dmlc#5326 that has been split - this also includes dmlc#5387

sriramch added 2 commits March 3, 2020 18:41

- clang tidy fix

f522124

RAMitchell reviewed Mar 3, 2020

View reviewed changes

Merge branch 'master' of https://github.com/dmlc/xgboost into rm_gpu_…

ec51efc

…metric_reg

- account for scenarios when training config changes from gpu->cpu fr…

bc0af2e

…om within a process

RAMitchell approved these changes Mar 5, 2020

View reviewed changes

trivialfis approved these changes Mar 6, 2020

View reviewed changes

sriramch added a commit to sriramch/xgboost that referenced this pull request Mar 6, 2020

- ranking metric accelaration on the gpu

604c63b

- this is the last part of dmlc#5326 that has been split - this also includes dmlc#5387

RAMitchell merged commit 1ba6706 into dmlc:master Mar 7, 2020

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

- create a gpu metrics (internal) registry #5387

- create a gpu metrics (internal) registry #5387

sriramch commented Mar 3, 2020 •

edited

Loading

hcho3 commented Mar 3, 2020

mli commented Mar 3, 2020

RAMitchell left a comment

sriramch commented Mar 3, 2020

sriramch commented Mar 3, 2020 •

edited

Loading

RAMitchell commented Mar 4, 2020

sriramch commented Mar 4, 2020

trivialfis commented Mar 4, 2020

sriramch commented Mar 4, 2020

RAMitchell left a comment

trivialfis left a comment •

edited

Loading

- create a gpu metrics (internal) registry #5387

- create a gpu metrics (internal) registry #5387

Conversation

sriramch commented Mar 3, 2020 • edited Loading

hcho3 commented Mar 3, 2020

mli commented Mar 3, 2020

Codecov Report

RAMitchell left a comment

Choose a reason for hiding this comment

sriramch commented Mar 3, 2020

sriramch commented Mar 3, 2020 • edited Loading

RAMitchell commented Mar 4, 2020

sriramch commented Mar 4, 2020

trivialfis commented Mar 4, 2020

sriramch commented Mar 4, 2020

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis left a comment • edited Loading

Choose a reason for hiding this comment

sriramch commented Mar 3, 2020 •

edited

Loading

sriramch commented Mar 3, 2020 •

edited

Loading

trivialfis left a comment •

edited

Loading