You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data format used by the newsroom-run -> newsroom-score -> newsroom-tables evaluation pipeline does not appear to keep track of sentence tokenization. When sentence tokenization is not provided to ROUGE-1.5.5, multi-sentence references and hypotheses are evaluated as one long sentence. As a result, ROUGE-L scores produced by this evaluation pipeline (1) may be lower than expected compared to more standard ROUGE evaluation that uses tokenized sentences, and (2) probably do not match the ROUGE-L scores in the Newsroom paper, which are computed using sentence tokenization.
I would recommend adding a notice to the README that the evaluation pipeline does not keep track of sentence tokenization, which may result in lower-than-expected ROUGE-L scores, and that the newsroom-run -> newsroom-score -> newsroom-tables evaluation pipeline should not be used for publishable evaluation.
The text was updated successfully, but these errors were encountered:
ROUGE-L is sensitive to sentence tokenization.
The data format used by the
newsroom-run
->newsroom-score
->newsroom-tables
evaluation pipeline does not appear to keep track of sentence tokenization. When sentence tokenization is not provided to ROUGE-1.5.5, multi-sentence references and hypotheses are evaluated as one long sentence. As a result, ROUGE-L scores produced by this evaluation pipeline (1) may be lower than expected compared to more standard ROUGE evaluation that uses tokenized sentences, and (2) probably do not match the ROUGE-L scores in the Newsroom paper, which are computed using sentence tokenization.I would recommend adding a notice to the README that the evaluation pipeline does not keep track of sentence tokenization, which may result in lower-than-expected ROUGE-L scores, and that the
newsroom-run
->newsroom-score
->newsroom-tables
evaluation pipeline should not be used for publishable evaluation.The text was updated successfully, but these errors were encountered: