Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilabel ranking metrics #614

Closed
tqbl opened this issue Nov 10, 2021 · 3 comments · Fixed by #787
Closed

Multilabel ranking metrics #614

tqbl opened this issue Nov 10, 2021 · 3 comments · Fixed by #787
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed New metric
Milestone

Comments

@tqbl
Copy link

tqbl commented Nov 10, 2021

🚀 Feature

The sklearn library provides a number of multilabel ranking metrics. It would be nice if torchmetrics implemented some of these metrics too. The three I have in mind are coverage_error, label_ranking_average_precision_score, and label_ranking_loss.

Implementation

Edit 2021-11-12: I noticed the output was wrong when sample_weights was used. This has now been fixed. A mistake found in label_ranking_loss has also been fixed. I tested the code using a multi-label dataset and found it matches sklearn up to rounding errors, which can be rectified if float64 is used.


I'm not that familiar with the torchmetrics code, but I can provide some implementations that someone in the team could enhance and integrate into the library. I used the sklearn source code as reference.

def coverage_error(y_pred, y_true, sample_weights=None):
    offset = torch.zeros_like(y_pred)
    offset[y_true == 0] = 1.1  # Any number >1 works
    y_pred_mod = y_pred + offset
    y_pred_min = y_pred_mod.min(dim=1)[0]
    coverage = (y_pred >= y_pred_min[:, None]).sum(dim=1).to(torch.float32)

    if sample_weights is not None:
        coverage *= sample_weights
        return coverage.sum() / sample_weights.sum()

    return coverage.mean()


def label_ranking_average_precision(y_pred, y_true, sample_weights=None):
    # Invert so that the highest score receives rank 1
    y_pred = -y_pred

    score = torch.tensor(0.0, device=y_pred.device)
    n_preds, n_labels = y_pred.shape
    for i in range(n_preds):
        relevant = y_true[i] == 1
        L = rank_data(y_pred[i][relevant])
        if len(L) > 0 and len(L) < n_labels:
            rank = rank_data(y_pred[i])[relevant]
            score_i = (L / rank).mean()
        else:
            score_i = 1.0

        if sample_weights is not None:
            score_i *= sample_weights[i]

        score += score_i

    if sample_weights is None:
        score /= n_preds
    else:
        score /= sample_weights.sum()
    return score


def label_ranking_loss(y_pred, y_true, sample_weights=None):
    n_labels = y_pred.shape[1]
    relevant = y_true == 1
    n_relevant = relevant.sum(dim=1)

    # Ignore instances where number of true labels is 0 or n_labels
    mask = (n_relevant > 0) & (n_relevant < n_labels)
    y_pred = y_pred[mask]
    relevant = relevant[mask]
    n_relevant = n_relevant[mask]

    if len(y_pred) == 0:
        return torch.tensor(0.0, device=y_pred.device)

    inverse = y_pred.argsort(dim=1).argsort(dim=1)
    per_label_loss = ((n_labels - inverse) * relevant).to(torch.float32)
    correction = 0.5 * n_relevant * (n_relevant + 1)  # Sum of 1..n
    denom = n_relevant * (n_labels - n_relevant)
    loss = (per_label_loss.sum(dim=1) - correction) / denom

    if sample_weights is not None:
        loss *= sample_weights[mask]
        return loss.sum() / sample_weights.sum()

    return loss.mean()


def rank_data(x):
    unique, inverse, counts = torch.unique(
        x, sorted=True, return_inverse=True, return_counts=True)
    ranks = counts.cumsum(dim=0)
    return ranks[inverse]
@tqbl tqbl added the enhancement New feature or request label Nov 10, 2021
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

Hi @tqbl,
Thanks for proposing these metrics. I can probably covert them to proper torchmetrics format sometime in the future. Would it be possible for you to describe what the expected input looks like? Is it exactly the same as sklearn?

@tqbl
Copy link
Author

tqbl commented Nov 15, 2021

Thanks @SkafteNicki. The inputs are expected to be the same as in sklearn, except they should be PyTorch tensors rather than NumPy arrays. The inputs shapes and the data itself should be the same.

  1. y_pred should be a (N,K) tensor of prediction scores (not necessarily probabilities).
  2. y_true should be a (N,K) tensor of ground truth labels in binary indicator format.
  3. sample_weights should be a (N,) tensor of sample weights.

N is the number of instances and K is the number of classes.

@SkafteNicki SkafteNicki self-assigned this Nov 16, 2021
@Borda Borda added the help wanted Extra attention is needed label Jan 6, 2022
@Borda Borda added this to the v0.8 milestone May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed New metric
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants