All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
v0.2.5 - 2022-04-22
- BERTScore now loads the model weights in the construction instaed of each time the scoring method is called.
v0.2.4 - 2022-04-05
- Added saving the Fabbri data with the original reference summaries and documents
- Adding some annotations from MultiLing 2019
- Added Dockerized versions of ROUGE and QAEval
- Added annotations from Kryscinski et al. (2019).
- Added a Dockerized version of BERTScore
- Added a Dockerized version of BLEURT
- Added a Dockerized version of MoverScore
- Added a Dockerized version of BARTScore
- Added a Dockerized version of Lite3Pyramid
- Moved the actual QAEval metric implementation into the
qaeval
library. The new minimum version isqaeval==0.1.0
.
- Removed
DecomposedRouge
, which can now be found here.
v0.2.3 - 2021-07-06
- Added
SentBLEU
under the namesent-bleu
. - Added saving all of the summaries from the 16 models in Fabbri et al. (2020).
- Added saving all of the summaries from the data in Bhandari et al. (2020).
- Updated
Blanc
to useblanc==0.2.1
. - Setting up
fabbri2020
now automatically downloads the tar files
- Removed
idf
support for BERTScore so we can remove the dependence on our fork of the original repo.
v0.2.2 - 2021-06-16
- Added a
verbose
option to QAEval
- Changed QAEval to use the updated
qaeval
interface with version 0.0.8. The QA results will now include the answer offsets.
v0.2.1 - 2021-05-06
- Added the New York Times dataset. See here.
- Added better tutorials for using the library.
- Added an exception with an error message if PyrEval is used with a single reference summary.
- Fix the
overrides
package to version 3.1.0 to fix a bug that was caused in theParams
class inoverrides
version 6.0.0
v0.2.0 - 2021-03-26
- Added the annotations collected by Bhandari et al., (2020).
- Added BLANC
- Added the annotations collected by the BLANC paper.
- Added a wrapper around the implementation of APES.
- Added the Multi-News dataset.
- Added the WCEP dataset
- Added confidence interval calculation and running hypothesis tests for the correlation coefficients
- Changed the backend for the correlation calculation to use matrices instead of the
MetricsDict
s
- Fixed a bug in which QAEval would crash if you don't use LERC
v0.1.5 - 2021-01-02
- Including the LERC output from the individual QA pairs in QAEval
v0.1.4 - 2021-01-02
- Added scoring QAEval predictions with LERC
- Creating the
.sacrerouge/metrics
directory in the BLEURT setup script if it doesn't exist.
v0.1.3 - 2020-11-25
- Added ability to skip calculating specific correlation levels (summary, system, and global)
- Added optionally generating plots of the system-level and global metric values
- Added passing a
List[Metrics]
to the correlation calculation instead of just a file or list of files
- Updating
spacy
package version to2.3.3
and model version to2.3.1
.DecomposedRouge
's unit tests and experiments subsequently updated. - Changed all positional arguments to commands to non-positional for improved readability of the commands.