Skip to content

Commit

Permalink
Re-implement PR-AUC. (#7297)
Browse files Browse the repository at this point in the history
* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
  • Loading branch information
trivialfis authored Oct 26, 2021
1 parent a6bcd54 commit d434942
Show file tree
Hide file tree
Showing 12 changed files with 1,035 additions and 655 deletions.
8 changes: 6 additions & 2 deletions doc/parameter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -393,9 +393,13 @@ Specify the learning task and the corresponding learning objective. The objectiv
- When used with multi-class classification, objective should be ``multi:softprob`` instead of ``multi:softmax``, as the latter doesn't output probability. Also the AUC is calculated by 1-vs-rest with reference class weighted by class prevalence.
- When used with LTR task, the AUC is computed by comparing pairs of documents to count correctly sorted pairs. This corresponds to pairwise learning to rank. The implementation has some issues with average AUC around groups and distributed workers not being well-defined.
- On a single machine the AUC calculation is exact. In a distributed environment the AUC is a weighted average over the AUC of training rows on each node - therefore, distributed AUC is an approximation sensitive to the distribution of data across workers. Use another metric in distributed environments if precision and reproducibility are important.
- If input dataset contains only negative or positive samples the output is `NaN`.
- When input dataset contains only negative or positive samples, the output is `NaN`. The behavior is implementation defined, for instance, ``scikit-learn`` returns :math:`0.5` instead.

- ``aucpr``: `Area under the PR curve <https://en.wikipedia.org/wiki/Precision_and_recall>`_.
Available for classification and learning-to-rank tasks.

After XGBoost 1.6, both of the requirements and restrictions for using ``aucpr`` in classification problem are similar to ``auc``. For ranking task, only binary relevance label :math:`y \in [0, 1]` is supported. Different from ``map (mean average precision)``, ``aucpr`` calculates the *interpolated* area under precision recall curve using continuous interpolation.

- ``aucpr``: `Area under the PR curve <https://en.wikipedia.org/wiki/Precision_and_recall>`_. Available for binary classification and learning-to-rank tasks.
- ``ndcg``: `Normalized Discounted Cumulative Gain <http://en.wikipedia.org/wiki/NDCG>`_
- ``map``: `Mean Average Precision <http://en.wikipedia.org/wiki/Mean_average_precision#Mean_average_precision>`_
- ``ndcg@n``, ``map@n``: 'n' can be assigned as an integer to cut off the top positions in the lists for evaluation.
Expand Down
14 changes: 14 additions & 0 deletions src/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <string>
#include <sstream>
#include <numeric>
#include <utility>

#if defined(__CUDACC__)
#include <thrust/system/cuda/error.h>
Expand Down Expand Up @@ -86,6 +87,19 @@ XGBOOST_DEVICE T1 DivRoundUp(const T1 a, const T2 b) {
return static_cast<T1>(std::ceil(static_cast<double>(a) / b));
}

namespace detail {
template <class T, std::size_t N, std::size_t... Idx>
constexpr auto UnpackArr(std::array<T, N> &&arr, std::index_sequence<Idx...>) {
return std::make_tuple(std::forward<std::array<T, N>>(arr)[Idx]...);
}
} // namespace detail

template <class T, std::size_t N>
constexpr auto UnpackArr(std::array<T, N> &&arr) {
return detail::UnpackArr(std::forward<std::array<T, N>>(arr),
std::make_index_sequence<N>{});
}

/*
* Range iterator
*/
Expand Down
Loading

0 comments on commit d434942

Please sign in to comment.