You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your contribution with this library. I am experimenting a problem computing BLEU and CHRF when some hypothesis are empty strings. The code to reproduce the problem is the following:
hypothesis_1 = ['A B C', 'B C D', 'C D E']
hypothesis_2 = ['', 'B C D', 'C D E']
hypothesis_3 = ['A B C', '', 'C D E']
hypothesis_4 = ['A B C', '', '']
Hello,
Thank you for your contribution with this library. I am experimenting a problem computing BLEU and CHRF when some hypothesis are empty strings. The code to reproduce the problem is the following:
import sacrebleu as s
print("Version:", s.version)
bleu = s.BLEU()
chrf = s.CHRF()
hypothesis_1 = ['A B C', 'B C D', 'C D E']
hypothesis_2 = ['', 'B C D', 'C D E']
hypothesis_3 = ['A B C', '', 'C D E']
hypothesis_4 = ['A B C', '', '']
refs = [['A B C'], ['B C D'], ['C D E']]
print()
print("hypothesis_1 CHRF:", chrf.corpus_score(hypothesis_1, refs).score)
print("hypothesis_1 BLEU:", chrf.corpus_score(hypothesis_1, refs).score)
print()
print("hypothesis_2 CHRF:", chrf.corpus_score(hypothesis_2, refs).score)
print("hypothesis_2 BLEU:", chrf.corpus_score(hypothesis_2, refs).score)
print()
print("hypothesis_3 CHRF:", chrf.corpus_score(hypothesis_3, refs).score)
print("hypothesis_3 BLEU:", chrf.corpus_score(hypothesis_3, refs).score)
print()
print("hypothesis_4 CHRF:", chrf.corpus_score(hypothesis_4, refs).score)
print("hypothesis_4 BLEU:", chrf.corpus_score(hypothesis_4, refs).score)
This code produces the following outputs:
Version: 2.3.1
hypothesis_1 CHRF: 100.0
hypothesis_1 BLEU: 100.0
hypothesis_2 CHRF: 0.0
hypothesis_2 BLEU: 0.0
hypothesis_3 CHRF: 100.0
hypothesis_3 BLEU: 100.0
hypothesis_4 CHRF: 100.0
hypothesis_4 BLEU: 100.0
I have not experienced this problem for TER.
Do you recommend me to use metrics at sentence level and compute the mean?
Best
The text was updated successfully, but these errors were encountered: