Skip to content

Releases: Lightning-AI/torchmetrics

Minor patch release

06 May 06:25
Compare
Choose a tag to compare

[0.8.2] - 2022-05-06

Fixed

  • Fixed multi-device aggregation in PearsonCorrCoef (#998)
  • Fixed MAP metric when using a custom list of thresholds (#995)
  • Fixed compatibility between compute groups in MetricCollection and prefix/postfix arg (#1007)
  • Fixed compatibility with future Pytorch 1.12 in safe_matmul (#1011, #1014)

Contributors

@ben-davidson-6, @Borda, @SkafteNicki, @tanmoyio

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

27 Apr 05:02
Compare
Choose a tag to compare

[0.8.1] - 2022-04-27

Changed

  • Reimplemented the signal_distortion_ratio metric, which removed the absolute requirement of fast-bss-eval (#964)

Fixed

  • Fixed "Sort currently does not support bool dtype on CUDA" error in MAP for empty preds (#983)
  • Fixed BinnedPrecisionRecallCurve when thresholds argument is not provided (#968)
  • Fixed CalibrationError to work on logit input (#985)

Contributors

@DuYicong515, @krshrimali, @quancs, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Faster collection and more metrics!

15 Apr 01:07
Compare
Choose a tag to compare

We are excited to announce that TorchMetrics v0.8 is now available. The release includes several new metrics in the classification and image domains and some performance improvements for those working with metrics collections.

Metric collections just got faster

Common wisdom dictates that you should never evaluate the performance of your models using only a single metric but instead a collection of metrics. For example, it is common to simultaneously evaluate the accuracy, precision, recall, and f1 score in classification. In TorchMetrics, we have for a long time provided the MetricCollection object for chaining such metrics together for an easy interface to calculate them all at once. However, in many cases, such a collection of metrics shares some of the underlying computations that have been repeated for every metric in the collection. In Torchmetrics v0.8 we have introduced the concept of compute_groups to MetricCollection that will, as default, be auto-detected and group metrics that share some of the same computations.

Thus, if you are using MetricCollections in your code, upgrading to TorchMetrics v0.8 should automatically make your code run faster without any code changes.

Many exciting new metrics

TorchMetrics v0.8 includes several new metrics within the classification and image domain, both for the functional and modular API. We refer to the documentation for the full description of all metrics if you want to learn more about them.

  • SpectralAngleMapper or SAM was added to the image package. This metric can calculate the spectral similarity between given reference spectra and estimated spectra.
  • CoverageError was added to the classification package. This metric can be used when you are working with multi-label data. The metric works similar to the sklearn counterpart and computes how far you need to go through ranked scores such that all true labels are covered.
  • LabelRankingAveragePrecision and LabelRankingLoss were added to the classification package. Both metrics are used in multi-label ranking problems, where the goal is to give a better rank to the labels associated with each sample. Each metric gives a measure of how well your model is doing this.
  • ErrorRelativeGlobalDimensionlessSynthesis or ERGAS was added to the image package. This metric can be used to calculate the accuracy of Pan sharpened images considering the normalized average error of each band of the resulting image.
  • UniversalImageQualityIndex was added to the image package. This metric can assess the difference between two images, which considers three different factors when computed: loss of correlation, luminance distortion, and contrast distortion.
  • ClasswiseWrapper was added to the wrapper package. This wrapper can be used in combinations with metrics that return multiple values (such as classification metrics with the average=None argument). The wrapper will unwrap the result into a dict with a label for each value.

[0.8.0] - 2022-04-14

Added

  • Added WeightedMeanAbsolutePercentageError to regression package (#948)
  • Added new classification metrics:
    • CoverageError (#787)
    • LabelRankingAveragePrecision and LabelRankingLoss (#787)
  • Added new image metric:
    • SpectralAngleMapper (#885)
    • ErrorRelativeGlobalDimensionlessSynthesis (#894)
    • UniversalImageQualityIndex (#824)
    • SpectralDistortionIndex (#873)
  • Added support for MetricCollection in MetricTracker (#718)
  • Added support for 3D image and uniform kernel in StructuralSimilarityIndexMeasure (#818)
  • Added smart update of MetricCollection (#709)
  • Added ClasswiseWrapper for better logging of classification metrics with multiple output values (#832)
  • Added **kwargs argument for passing additional arguments to base class (#833)
  • Added negative ignore_index for the Accuracy metric (#362)
  • Added adaptive_k for the RetrievalPrecision metric (#910)
  • Added reset_real_features argument image quality assessment metrics (#722)
  • Added new keyword argument compute_on_cpu to all metrics (#867)

Changed

  • Made num_classes in jaccard_index a required argument (#853, #914)
  • Added normalizer, tokenizer to ROUGE metric (#838)
  • Improved shape checking of permutation_invariant_training (#864)
  • Allowed reduction None (#891)
  • MetricTracker.best_metric will now give a warning when computing on metric that do not have a best (#913)

Deprecated

  • Deprecated argument compute_on_step (#792)
  • Deprecated passing in dist_sync_on_step, process_group, dist_sync_fn direct argument (#833)

Removed

  • Removed support for versions of Lightning lower than v1.5 (#788)
  • Removed deprecated functions, and warnings in Text (#773)
    • WER and functional.wer
  • Removed deprecated functions and warnings in Image (#796)
    • SSIM and functional.ssim
    • PSNR and functional.psnr
  • Removed deprecated functions, and warnings in classification and regression (#806)
    • FBeta and functional.fbeta
    • F1 and functional.f1
    • Hinge and functional.hinge
    • IoU and functional.iou
    • MatthewsCorrcoef
    • PearsonCorrcoef
    • SpearmanCorrcoef
  • Removed deprecated functions, and warnings in detection and pairwise (#804)
    • MAP and functional.pairwise.manhatten
  • Removed deprecated functions, and warnings in Audio (#805)
    • PESQ and functional.audio.pesq
    • PIT and functional.audio.pit
    • SDR and functional.audio.sdr and functional.audio.si_sdr
    • SNR and functional.audio.snr and functional.audio.si_snr
    • STOI and functional.audio.stoi

Fixed

  • Fixed device mismatch for MAP metric in specific cases (#950)
  • Improved testing speed (#820)
  • Fixed compatibility of ClasswiseWrapper with the prefix argument of MetricCollection (#843)
  • Fixed BestScore on GPU (#912)
  • Fixed Lsum computation for ROUGEScore (#944)

Contributors

@ankitaS11, @ashutoshml, @Borda, @hookSSi, @justusschock, @lucadiliello, @quancs, @rusty1s, @SkafteNicki, @stancld, @vumichien, @weningerleon, @yassersouri

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

23 Mar 19:47
Compare
Choose a tag to compare

[0.7.3] - 2022-03-22

Fixed

  • Fixed unsafe log operation in TweedieDeviace for power=1 (#847)
  • Fixed bug in MAP metric related to either no ground truth or no predictions (#884)
  • Fixed ConfusionMatrix, AUROC and AveragePrecision on GPU when running in deterministic mode (#900)
  • Fixed NaN or Inf results returned by signal_distortion_ratio (#899)
  • Fixed memory leak when using update method with tensor where requires_grad=True (#902)

Contributors

@mtailanian, @quancs, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

JOSS paper

10 Feb 17:04
Compare
Choose a tag to compare

[0.7.2] - 2022-02-10

Fixed

  • Minor patches in JOSS paper.

Improve mAP performance

03 Feb 20:42
Compare
Choose a tag to compare

[0.7.1] - 2022-02-03

Changed

  • Used torch.bucketize in calibration error when torch>1.8 for faster computations (#769)
  • Improve mAP performance (#742)

Fixed

  • Fixed check for available modules (#772)
  • Fixed Matthews correlation coefficient when the denominator is 0 (#781)

Contributors

@Borda, @ramonemiliani93, @SkafteNicki, @twsl

If we forgot someone due to not matching commit email with GitHub account, let us know :]

New NLP metrics and improved API

17 Jan 18:33
Compare
Choose a tag to compare

We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever.

NLP metrics - Text package

Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluation of the ROUGE score using multiple references.

Argument unification

Importantly, all text metrics assume preds, target input order with these explicit keyword arguments. If different naming was used before v0.7, it is deprecated and completely removed in v0.8.

Import and naming changes

TorchMetrics v0.7 brings more extensive and minor changes to how metrics should be imported. The import changes directly impact v0.7, meaning that you will most likely need to change the import statement for some specific metrics. All naming changes follow our standard deprecation process, meaning that in v0.7, any metric that is renamed will still work but raise an error asking to use the new metric name. From v0.8, the old metric names will no longer be available.

[0.7.0] - 2022-01-17

Added

  • Added NLP metrics:
    • MatchErrorRate (#619)
    • WordInfoLost and WordInfoPreserved (#630)
    • SQuAD (#623)
    • CHRFScore (#641)
    • TranslationEditRate (#646)
    • ExtendedEditDistance (#668)
  • Added MultiScaleSSIM into image metrics (#679)
  • Added Signal to Distortion Ratio (SDR) to audio package (#565)
  • Added MinMaxMetric to wrappers (#556)
  • Added ignore_index to retrieval metrics (#676)
  • Added support for multi references in ROUGEScore (#680)
  • Added a default VSCode devcontainer configuration (#621)

Changed

  • Scalar metrics will now consistently have additional dimensions squeezed (#622)
  • Metrics having third party dependencies removed from global import (#463)
  • Untokenized for BLEUScore input stay consistent with all the other text metrics (#640)
  • Arguments reordered for TER, BLEUScore, SacreBLEUScore, CHRFScore now the expected input order is predictions first and target second (#696)
  • Changed dtype of metric state from torch.float to torch.long in ConfusionMatrix to accommodate larger values (#715)
  • Unify preds, target input argument's naming across all text metrics (#723, #727)
    • bert, bleu, chrf, sacre_bleu, wip, wil, cer, ter, wer, mer, rouge, squad

Deprecated

  • Renamed IoU -> Jaccard Index (#662)
  • Renamed text WER metric: (#714)
    • functional.wer -> functional.word_error_rate
    • WER -> WordErrorRate
  • Renamed correlation coefficient classes: (#710)
    • MatthewsCorrcoef -> MatthewsCorrCoef
    • PearsonCorrcoef -> PearsonCorrCoef
    • SpearmanCorrcoef -> SpearmanCorrCoef
  • Renamed audio STOI metric: (#753, #758)
    • audio.STOI to audio.ShortTimeObjectiveIntelligibility
    • functional.audio.stoi to functional.audio.short_time_objective_intelligibility
  • Renamed audio PESQ metrics: (#751)
    • functional.audio.pesq -> functional.audio.perceptual_evaluation_speech_quality
    • audio.PESQ -> audio.PerceptualEvaluationSpeechQuality
  • Renamed audio SDR metrics: (#711)
    • functional.sdr -> functional.signal_distortion_ratio
    • functional.si_sdr -> functional.scale_invariant_signal_distortion_ratio
    • SDR -> SignalDistortionRatio
    • SI_SDR -> ScaleInvariantSignalDistortionRatio
  • Renamed audio SNR metrics: (#712)
    • functional.snr -> functional.signal_distortion_ratio
    • functional.si_snr -> functional.scale_invariant_signal_noise_ratio
    • SNR -> SignalNoiseRatio
    • SI_SNR -> ScaleInvariantSignalNoiseRatio
  • Renamed F-score metrics: (#731, #740)
    • functional.f1 -> functional.f1_score
    • F1 -> F1Score
    • functional.fbeta -> functional.fbeta_score
    • FBeta -> FBetaScore
  • Renamed Hinge metric: (#734)
    • functional.hinge -> functional.hinge_loss
    • Hinge -> HingeLoss
  • Renamed image PSNR metrics (#732)
    • functional.psnr -> functional.peak_signal_noise_ratio
    • PSNR -> PeakSignalNoiseRatio
  • Renamed image PIT metric: (#737)
    • functional.pit -> functional.permutation_invariant_training
    • PIT -> PermutationInvariantTraining
  • Renamed image SSIM metric: (#747)
    • functional.ssim -> functional.scale_invariant_signal_noise_ratio
    • SSIM -> StructuralSimilarityIndexMeasure
  • Renamed detection MAP to MeanAveragePrecision metric (#754)
  • Renamed Fidelity & LPIPS image metric: (#752)
    • image.FID -> image.FrechetInceptionDistance
    • image.KID -> image.KernelInceptionDistance
    • image.LPIPS -> image.LearnedPerceptualImagePatchSimilarity

Removed

  • Removed embedding_similarity metric (#638)
  • Removed argument concatenate_texts from wer metric (#638)
  • Removed arguments newline_sep and decimal_places from rouge metric (#638)

Fixed

  • Fixed MetricCollection kwargs filtering when no kwargs are present in update signature (#707)

Contributors

@ashutoshml, @Borda, @cuent, @Fariborzzz, @getgaurav2, @janhenriklambrechts, @justusschock, @karthikrangasai, @lucadiliello, @mahinlma, @mathemusician, @mona0809, @mrleu, @puhuk, @quancs, @SkafteNicki, @stancld, @twsl

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Fixing mAP on GPU

15 Dec 16:40
Compare
Choose a tag to compare

[0.6.2] - 2021-12-15

Fixed

  • Fixed torch.sort currently does not support bool dtype on CUDA (#665)
  • Fixed mAP properly checks if ground truths are empty (#684)
  • Fixed initialization of tensors to be on the correct device for MAP metric (#673)

Contributors

@OlofHarrysson, @tkupek, @twsl

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Own mAP implementation

06 Dec 09:43
Compare
Choose a tag to compare

[0.6.1] - 2021-12-06

Changed

  • Migrate MAP metrics from pycocotools to PyTorch (#632)
  • Use torch.topk instead of torch.argsort in retrieval precision for speedup (#627)

Fixed

  • Fix empty predictions in MAP metric (#594, #610, #624)
  • Fix edge case of AUROC with average=weighted on GPU (#606)
  • Fixed forward in compositional metrics (#645)

Contributors

@Callidior, @SkafteNicki, @tkupek, @twsl, @zuoxingdong

If we forgot someone due to not matching commit email with GitHub account, let us know :]

More metrics than ever

28 Oct 22:42
Compare
Choose a tag to compare

[0.6.0] - 2021-10-28

We are excited to announce that Torchmetrics v0.6 is now publicly available. TorchMetrics v0.6 does not focus on specific domains but adds a ton of new metrics to several domains, thus increasing the number of metrics in the repository to over 60! Not only have v0.6 added metrics within already covered domains, but we also add support for two new: Pairwise metrics and detection.

https://devblog.pytorchlightning.ai/torchmetrics-v0-6-more-metrics-than-ever-e98c3983621e

Pairwise Metrics

TorchMetrics v0.6 offers a new set of metrics in its functional backend for calculating pairwise distances. Given a tensor X with shape [N,d] (N observations, each in d dimensions), a pairwise metric calculates [N,N] matrix of all possible combinations between the rows of X.

Detection

TorchMetrics v0.6 now includes a detection package that provides for the MAP metric. The implementation essentially wraps pycocotools around securing that we get the correct value, but with the benefit of now being able to scale to multiple devices (as any other metric in TorchMetrics).

New additions

  • In the audio package, we have two new metrics: Perceptual Evaluation of Speech Quality (PESQ) and Short Term Objective Intelligibility (STOI). Both metrics can be used to assert speech quality.

  • In the retrieval package, we also have two new metrics: R-precision and Hit-rate. R-precision corresponds to recall at the R-th position of the query. The hit rate is the ratio of the total number of hits returned as a result of a query (hits) to the total number of hits returned.

  • The text package also receives an update in the form of two new metrics: Sacre BLEU score and character error rate. Sacre BLUE score provides and more systematic way of comparing BLUE scores across tasks. The character error rate is similar to the word error rate but instead calculates if a given algorithm has correctly predicted a sentence based on a character-by-character comparison.

  • The regression package got a single new metric in the form of the Tweedie deviance score metric. Deviance scores are generally a better measure of fit than measures such as squared error when trying to model data coming from highly screwed distributions.

  • Finally, we have added five new metrics for simple aggregation: SumMetric, MeanMetric, MinMetric, MaxMetric, CatMetric. All five metrics take in a single input (either native python floats or torch.Tensor) and keep track of the sum, average, min, etc. These new aggregation metrics are especially useful in combination with self.log from lightning if you want to log something other than the average of the metric you are tracking.

Detail changes

Added

  • Added audio metrics:
    • Perceptual Evaluation of Speech Quality (PESQ) (#353)
    • Short Term Objective Intelligibility (STOI) (#353)
  • Added Information retrieval metrics:
    • RetrievalRPrecision (#577)
    • RetrievalHitRate (#576)
  • Added NLP metrics:
    • SacreBLEUScore (#546)
    • CharErrorRate (#575)
  • Added other metrics:
    • Tweedie Deviance Score (#499)
    • Learned Perceptual Image Patch Similarity (LPIPS) (#431)
  • Added MAP (mean average precision) metric to new detection package (#467)
  • Added support for float targets in nDCG metric (#437)
  • Added average argument to AveragePrecision metric for reducing multi-label and multi-class problems (#477)
  • Added MultioutputWrapper (#510)
  • Added metric sweeping:
    • higher_is_better as constant attribute (#544)
    • higher_is_better to rest of codebase (#584)
  • Added simple aggregation metrics: SumMetric, MeanMetric, CatMetric, MinMetric, MaxMetric (#506)
  • Added pairwise submodule with metrics (#553)
    • pairwise_cosine_similarity
    • pairwise_euclidean_distance
    • pairwise_linear_similarity
    • pairwise_manhatten_distance

Changed

  • AveragePrecision will now as default output the macro average for multilabel and multiclass problems (#477)
  • half, double, float will no longer change the dtype of the metric states. Use metric.set_dtype instead (#493)
  • Renamed AverageMeter to MeanMetric (#506)
  • Changed is_differentiable from property to a constant attribute (#551)
  • ROC and AUROC will no longer throw an error when either the positive or negative class is missing. Instead, return 0 scores and give a warning

Deprecated

  • Deprecated torchmetrics.functional.self_supervised.embedding_similarity in favour of new pairwise submodule

Removed

  • Removed dtype property (#493)

Fixed

  • Fixed bug in F1 with average='macro' and ignore_index!=None (#495)
  • Fixed bug in pit by using the returned first result to initialize device and type (#533)
  • Fixed SSIM metric using too much memory (#539)
  • Fixed bug where device property was not properly updated when the metric was a child of a module (#542)

Contributors

@an1lam, @Borda, @karthikrangasai, @lucadiliello, @mahinlma, @Obus, @quancs, @SkafteNicki, @stancld, @tkupek

If we forgot someone due to not matching commit email with GitHub account, let us know :]