Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement ROC-AUC. #6747

Merged
merged 2 commits into from
Mar 20, 2021
Merged

Re-implement ROC-AUC. #6747

merged 2 commits into from
Mar 20, 2021

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Mar 11, 2021

  • Binary
  • MultiClass
  • LTR
  • Add documents.

This PR resolves a few issues:

  • Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values.
  • Define ROC-AUC for multi-class classification.
  • Define weighted average value for distributed setting.
  • A correct implementation for learning to rank task. The previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.

doc/parameter.rst Outdated Show resolved Hide resolved
src/common/common.h Outdated Show resolved Hide resolved
src/common/device_helpers.cuh Show resolved Hide resolved
src/metric/auc.cc Outdated Show resolved Hide resolved
src/metric/auc.cc Outdated Show resolved Hide resolved
src/metric/auc.cc Outdated Show resolved Hide resolved
src/metric/auc.cu Outdated Show resolved Hide resolved
src/metric/auc.h Outdated Show resolved Hide resolved
src/metric/auc.h Show resolved Hide resolved
tests/cpp/metric/test_auc.cc Outdated Show resolved Hide resolved
@hcho3
Copy link
Collaborator

hcho3 commented Mar 11, 2021

Does this PR also reimplement the learning to rank task? Can you point us to relevant literature for the new implementation?

@trivialfis
Copy link
Member Author

trivialfis commented Mar 11, 2021

Does this PR also reimplement the learning to rank task?

Yes. For the AUC metric. Some code was pulled from #6465 .

Can you point us to relevant literature for the new implementation?

I will do that later, it's a little bit complicated.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 11, 2021

Got it. It would be nice to see how the new implementation is more in line with the latest literature on Learning To Rank (LTR). We can even write a short vignette or technical report to document our new LTR implementation. This will be useful for others who would inherit this codebase in the future.

@trivialfis
Copy link
Member Author

I have adopted some heuristics here. Will explain in detail once I can get all tests passed.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 11, 2021

I have adopted some heuristics here.

Interesting. Once the implementation is complete and you can explain the heuristics, I'd like to volunteer time to write up a short technical report. This way, we could hopefully remember all the pertinent details and still be able to maintain the LTR objective and metrics in the coming future.

@codecov-io
Copy link

codecov-io commented Mar 13, 2021

Codecov Report

Merging #6747 (67d26b1) into master (4ee8340) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #6747   +/-   ##
=======================================
  Coverage   81.78%   81.78%           
=======================================
  Files          13       13           
  Lines        3848     3848           
=======================================
  Hits         3147     3147           
  Misses        701      701           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ee8340...67d26b1. Read the comment docs.

@trivialfis trivialfis mentioned this pull request Mar 13, 2021
@trivialfis trivialfis changed the title [WIP] Re-implement ROC-AUC. Re-implement ROC-AUC. Mar 13, 2021
@trivialfis
Copy link
Member Author

A smaller PR is extracted in #6749 .

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to get multiclass and ranking support. Some minor comments. I'm assuming your math is correct.

It might be a good idea to add hypothesis tests comparing the results against sklearn.

What is performance looking like?

doc/parameter.rst Outdated Show resolved Hide resolved
src/common/ranking_utils.cuh Show resolved Hide resolved
src/metric/auc.cu Outdated Show resolved Hide resolved
src/metric/auc.cu Show resolved Hide resolved
src/metric/auc.cu Show resolved Hide resolved
@trivialfis
Copy link
Member Author

It might be a good idea to add hypothesis tests comparing the results against sklearn.

I have some test cases for comparing with sklearn on binary classification and multi-class classification. Ranking is not supported by sklearn.

I will try to make the data more random with hypothesis.

@trivialfis
Copy link
Member Author

@RAMitchell

I added parametrize test with different sample sizes. scikit-learn throws an error when the dataset is invalid so the degenerate case is tested separately.

I will run some basic benchmark later.

@trivialfis
Copy link
Member Author

Ran small benchmark on higgs with Booster.eval. There's no performance change for binary classification:

  • before
    CPU 2.074251174926758
    GPU 0.022435665130615234

  • after
    CPU 2.070319652557373
    GPU 0.022629976272583008

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

@trivialfis trivialfis requested a review from hcho3 March 17, 2021 19:06
@hcho3 hcho3 added the Blocking label Mar 19, 2021
@trivialfis trivialfis mentioned this pull request Mar 19, 2021
23 tasks
Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some comments:

doc/parameter.rst Outdated Show resolved Hide resolved
src/common/ranking_utils.cuh Outdated Show resolved Hide resolved
src/metric/auc.cc Outdated Show resolved Hide resolved
src/metric/auc.cc Show resolved Hide resolved
src/metric/auc.cc Outdated Show resolved Hide resolved
src/metric/auc.cc Outdated Show resolved Hide resolved
Comment on lines +94 to +99
auto s_results = common::Span<float>(results);
auto local_area = s_results.subspan(0, n_classes);
auto tp = s_results.subspan(n_classes, n_classes);
auto auc = s_results.subspan(2 * n_classes, n_classes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very happy that we have access to the Span abstraction. Imagine having to sub-slice an array with raw pointers 😨

Copy link
Member Author

@trivialfis trivialfis Mar 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even better if we have a matrix/tensor library backend. ;-) If I were to start a new ML project that might be the first thing I do... (assuming it's not based on logic)

Comment on lines +101 to +104
#pragma omp parallel for
for (omp_ulong c = 0; c < n_classes; ++c) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder for the future: we may want to change this line if we were to adopt parallel scan in the CPU binary AUC. Something like 2D Task block would be necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's the best way for CPU. The GPU is parallized by the scan operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can look at tensorflow and see how they calculate AUC.

*/
float GroupRankingAUC(common::Span<float const> predts,
common::Span<float const> labels, float w) {
// on ranking, we just count all pairs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will impose O(n**2) cost for the AUC computation, where n is the number of data points in the evaluation set. Have you considered using a sampling approach, where a sample of pairs are randomly collected? The LTR objective currently uses this approach and offers a parameter called num_pairsample to control sampling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I want to remove the random sampling in objective and replace it with a top k parameter.

I implement AUC for ranking mostly for compatibility reason. I don't think it's a good metric for ranking. Averaging between groups is destroying all the auc properties.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis Hi, been watching through xgboost versions and stumbled upon this discussion. Could you please elaborate on this idea of averaging between groups that destroys auc properties? Or could you give a link to read about it? Thx in advance

Copy link
Member Author

@trivialfis trivialfis Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AUC is "area under the curve", which is a normalized ratio with a geometric interpretation. One cannot average ratios directly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

* Binary
* MultiClass
* LTR
* Add documents.

This PR resolves a few issues:
  - Define a value when dataset is invalid, which can happens if there's an
  empty dataset, or when the dataset contains only positive or negative value.
  - Define ROC-AUC for multi-class classification.
  - Define weighted average value for distributed setting.
  - A correct implementation for learning to rank task.  Previous
  implementation is just binary classification with averaging across groups,
  which doesn't measure ordered learning to rank.
@trivialfis
Copy link
Member Author

@hcho3 All style comments are addressed.

@trivialfis trivialfis merged commit bcc0277 into dmlc:master Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants