-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement ROC-AUC. #6747
Re-implement ROC-AUC. #6747
Conversation
Does this PR also reimplement the learning to rank task? Can you point us to relevant literature for the new implementation? |
Yes. For the AUC metric. Some code was pulled from #6465 .
I will do that later, it's a little bit complicated. |
Got it. It would be nice to see how the new implementation is more in line with the latest literature on Learning To Rank (LTR). We can even write a short vignette or technical report to document our new LTR implementation. This will be useful for others who would inherit this codebase in the future. |
I have adopted some heuristics here. Will explain in detail once I can get all tests passed. |
Interesting. Once the implementation is complete and you can explain the heuristics, I'd like to volunteer time to write up a short technical report. This way, we could hopefully remember all the pertinent details and still be able to maintain the LTR objective and metrics in the coming future. |
Codecov Report
@@ Coverage Diff @@
## master #6747 +/- ##
=======================================
Coverage 81.78% 81.78%
=======================================
Files 13 13
Lines 3848 3848
=======================================
Hits 3147 3147
Misses 701 701 Continue to review full report at Codecov.
|
A smaller PR is extracted in #6749 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to get multiclass and ranking support. Some minor comments. I'm assuming your math is correct.
It might be a good idea to add hypothesis tests comparing the results against sklearn.
What is performance looking like?
I have some test cases for comparing with sklearn on binary classification and multi-class classification. Ranking is not supported by sklearn. I will try to make the data more random with hypothesis. |
I added parametrize test with different sample sizes. scikit-learn throws an error when the dataset is invalid so the degenerate case is tested separately. I will run some basic benchmark later. |
Ran small benchmark on higgs with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some comments:
auto s_results = common::Span<float>(results); | ||
auto local_area = s_results.subspan(0, n_classes); | ||
auto tp = s_results.subspan(n_classes, n_classes); | ||
auto auc = s_results.subspan(2 * n_classes, n_classes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very happy that we have access to the Span
abstraction. Imagine having to sub-slice an array with raw pointers 😨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be even better if we have a matrix/tensor library backend. ;-) If I were to start a new ML project that might be the first thing I do... (assuming it's not based on logic)
#pragma omp parallel for | ||
for (omp_ulong c = 0; c < n_classes; ++c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for the future: we may want to change this line if we were to adopt parallel scan in the CPU binary AUC. Something like 2D Task block would be necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what's the best way for CPU. The GPU is parallized by the scan operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can look at tensorflow and see how they calculate AUC.
*/ | ||
float GroupRankingAUC(common::Span<float const> predts, | ||
common::Span<float const> labels, float w) { | ||
// on ranking, we just count all pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will impose O(n**2)
cost for the AUC computation, where n
is the number of data points in the evaluation set. Have you considered using a sampling approach, where a sample of pairs are randomly collected? The LTR objective currently uses this approach and offers a parameter called num_pairsample
to control sampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I want to remove the random sampling in objective and replace it with a top k parameter.
I implement AUC for ranking mostly for compatibility reason. I don't think it's a good metric for ranking. Averaging between groups is destroying all the auc properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trivialfis Hi, been watching through xgboost versions and stumbled upon this discussion. Could you please elaborate on this idea of averaging between groups that destroys auc properties? Or could you give a link to read about it? Thx in advance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AUC is "area under the curve", which is a normalized ratio with a geometric interpretation. One cannot average ratios directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx!
* Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when dataset is invalid, which can happens if there's an empty dataset, or when the dataset contains only positive or negative value. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.
@hcho3 All style comments are addressed. |
This PR resolves a few issues: