Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: correct reference length calculation (#195)
Summary: This PR fixes the way brevity penalty (specifically the effective reference corpus length) is calculated in BLEU. Previously, `len_reference` was calculated as `min([len(ref) for ref in references_tokenized])`. However, this is incorrect, because according to the paper, we need to find the "best match length", not the minimum reference length. For more information, see [wikipedia - brevity penalty](https://en.wikipedia.org/wiki/BLEU#Brevity_penalty) and [nltk implementation](https://www.nltk.org/_modules/nltk/translate/bleu_score.html#closest_ref_length). Pull Request resolved: #195 Test Plan: I added another unit test to `test_bleu.py` and compared the results of the calculations to the results of the `nltk.translate.bleu_score.corpus_bleu` function to make sure the implementation is correct. Reviewed By: galrotem Differential Revision: D56846091 Pulled By: JKSenthil fbshipit-source-id: 2bf1cd0ba169535a118222e60f4264259248f1fd
- Loading branch information