Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

v0.2.5 - 2022-04-22

Changed

BERTScore now loads the model weights in the construction instaed of each time the scoring method is called.

v0.2.4 - 2022-04-05

Added

Added saving the Fabbri data with the original reference summaries and documents
Adding some annotations from MultiLing 2019
Added Dockerized versions of ROUGE and QAEval
Added annotations from Kryscinski et al. (2019).
Added a Dockerized version of BERTScore
Added a Dockerized version of BLEURT
Added a Dockerized version of MoverScore
Added a Dockerized version of BARTScore
Added a Dockerized version of Lite3Pyramid

Changed

Moved the actual QAEval metric implementation into the qaeval library. The new minimum version is qaeval==0.1.0.

Removed

Removed DecomposedRouge, which can now be found here.

v0.2.3 - 2021-07-06

Added

Added SentBLEU under the name sent-bleu.
Added saving all of the summaries from the 16 models in Fabbri et al. (2020).
Added saving all of the summaries from the data in Bhandari et al. (2020).

Changed

Updated Blanc to use blanc==0.2.1.
Setting up fabbri2020 now automatically downloads the tar files

Removed

Removed idf support for BERTScore so we can remove the dependence on our fork of the original repo.

v0.2.2 - 2021-06-16

Added

Added a verbose option to QAEval

Changed

Changed QAEval to use the updated qaeval interface with version 0.0.8. The QA results will now include the answer offsets.

v0.2.1 - 2021-05-06

Added

Added the New York Times dataset. See here.
Added better tutorials for using the library.

Changed

Added an exception with an error message if PyrEval is used with a single reference summary.

Fixed

Fix the overrides package to version 3.1.0 to fix a bug that was caused in the Params class in overrides version 6.0.0

v0.2.0 - 2021-03-26

Added

Added the annotations collected by Bhandari et al., (2020).
Added BLANC
Added the annotations collected by the BLANC paper.
Added a wrapper around the implementation of APES.
Added the Multi-News dataset.
Added the WCEP dataset
Added confidence interval calculation and running hypothesis tests for the correlation coefficients

Changed

Changed the backend for the correlation calculation to use matrices instead of the MetricsDicts

Fixed

Fixed a bug in which QAEval would crash if you don't use LERC

v0.1.5 - 2021-01-02

Fixed

Including the LERC output from the individual QA pairs in QAEval

v0.1.4 - 2021-01-02

Added

Added scoring QAEval predictions with LERC

Fixed

Creating the .sacrerouge/metrics directory in the BLEURT setup script if it doesn't exist.

v0.1.3 - 2020-11-25

Added

Added ability to skip calculating specific correlation levels (summary, system, and global)
Added optionally generating plots of the system-level and global metric values
Added passing a List[Metrics] to the correlation calculation instead of just a file or list of files

Changed

Updating spacy package version to 2.3.3 and model version to 2.3.1. DecomposedRouge's unit tests and experiments subsequently updated.
Changed all positional arguments to commands to non-positional for improved readability of the commands.