Multilabel ranking metrics #614

tqbl · 2021-11-10T22:27:10Z

🚀 Feature

The sklearn library provides a number of multilabel ranking metrics. It would be nice if torchmetrics implemented some of these metrics too. The three I have in mind are coverage_error, label_ranking_average_precision_score, and label_ranking_loss.

Implementation

Edit 2021-11-12: I noticed the output was wrong when sample_weights was used. This has now been fixed. A mistake found in label_ranking_loss has also been fixed. I tested the code using a multi-label dataset and found it matches sklearn up to rounding errors, which can be rectified if float64 is used.

I'm not that familiar with the torchmetrics code, but I can provide some implementations that someone in the team could enhance and integrate into the library. I used the sklearn source code as reference.

def coverage_error(y_pred, y_true, sample_weights=None):
    offset = torch.zeros_like(y_pred)
    offset[y_true == 0] = 1.1  # Any number >1 works
    y_pred_mod = y_pred + offset
    y_pred_min = y_pred_mod.min(dim=1)[0]
    coverage = (y_pred >= y_pred_min[:, None]).sum(dim=1).to(torch.float32)

    if sample_weights is not None:
        coverage *= sample_weights
        return coverage.sum() / sample_weights.sum()

    return coverage.mean()


def label_ranking_average_precision(y_pred, y_true, sample_weights=None):
    # Invert so that the highest score receives rank 1
    y_pred = -y_pred

    score = torch.tensor(0.0, device=y_pred.device)
    n_preds, n_labels = y_pred.shape
    for i in range(n_preds):
        relevant = y_true[i] == 1
        L = rank_data(y_pred[i][relevant])
        if len(L) > 0 and len(L) < n_labels:
            rank = rank_data(y_pred[i])[relevant]
            score_i = (L / rank).mean()
        else:
            score_i = 1.0

        if sample_weights is not None:
            score_i *= sample_weights[i]

        score += score_i

    if sample_weights is None:
        score /= n_preds
    else:
        score /= sample_weights.sum()
    return score


def label_ranking_loss(y_pred, y_true, sample_weights=None):
    n_labels = y_pred.shape[1]
    relevant = y_true == 1
    n_relevant = relevant.sum(dim=1)

    # Ignore instances where number of true labels is 0 or n_labels
    mask = (n_relevant > 0) & (n_relevant < n_labels)
    y_pred = y_pred[mask]
    relevant = relevant[mask]
    n_relevant = n_relevant[mask]

    if len(y_pred) == 0:
        return torch.tensor(0.0, device=y_pred.device)

    inverse = y_pred.argsort(dim=1).argsort(dim=1)
    per_label_loss = ((n_labels - inverse) * relevant).to(torch.float32)
    correction = 0.5 * n_relevant * (n_relevant + 1)  # Sum of 1..n
    denom = n_relevant * (n_labels - n_relevant)
    loss = (per_label_loss.sum(dim=1) - correction) / denom

    if sample_weights is not None:
        loss *= sample_weights[mask]
        return loss.sum() / sample_weights.sum()

    return loss.mean()


def rank_data(x):
    unique, inverse, counts = torch.unique(
        x, sorted=True, return_inverse=True, return_counts=True)
    ranks = counts.cumsum(dim=0)
    return ranks[inverse]

The text was updated successfully, but these errors were encountered:

github-actions · 2021-11-10T22:27:52Z

Hi! thanks for your contribution!, great first issue!

SkafteNicki · 2021-11-15T12:29:53Z

Hi @tqbl,
Thanks for proposing these metrics. I can probably covert them to proper torchmetrics format sometime in the future. Would it be possible for you to describe what the expected input looks like? Is it exactly the same as sklearn?

tqbl · 2021-11-15T12:58:17Z

Thanks @SkafteNicki. The inputs are expected to be the same as in sklearn, except they should be PyTorch tensors rather than NumPy arrays. The inputs shapes and the data itself should be the same.

y_pred should be a (N,K) tensor of prediction scores (not necessarily probabilities).
y_true should be a (N,K) tensor of ground truth labels in binary indicator format.
sample_weights should be a (N,) tensor of sample weights.

N is the number of instances and K is the number of classes.

tqbl added the enhancement New feature or request label Nov 10, 2021

SkafteNicki added the New metric label Nov 11, 2021

SkafteNicki self-assigned this Nov 16, 2021

Borda added the help wanted Extra attention is needed label Jan 6, 2022

SkafteNicki mentioned this issue Jan 20, 2022

Multilabel Ranking metrics #787

Merged

4 tasks

Borda closed this as completed in #787 Mar 21, 2022

Borda added this to the v0.8 milestone May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilabel ranking metrics #614

Multilabel ranking metrics #614

tqbl commented Nov 10, 2021 •

edited

Loading

github-actions bot commented Nov 10, 2021

SkafteNicki commented Nov 15, 2021

tqbl commented Nov 15, 2021

Multilabel ranking metrics #614

Multilabel ranking metrics #614

Comments

tqbl commented Nov 10, 2021 • edited Loading

🚀 Feature

Implementation

github-actions bot commented Nov 10, 2021

SkafteNicki commented Nov 15, 2021

tqbl commented Nov 15, 2021

tqbl commented Nov 10, 2021 •

edited

Loading