You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running python3 -m unittest discover mt_metrics_eval "*_test.py" # Takes ~70 seconds., i got the following errors.
for testWMT23EnDeRatings and testWMT23ZhEnRatings: These tests are failing because there's a mismatch in the expected human rating names. The tests expect only 'mqm.merged', but the actual data contains individual rater scores ('mqm.rater1' through 'mqm.rater10') and additional merged scores ('round2.mqm.merged', 'round3.mqm.merged').
(mtme) bash-4.4$ python3 -m unittest discover mt_metrics_eval "*_test.py" # Takes ~70 seconds.
............F.F............................................./ivi/ilps/personal/stan1/reward/mt-metrics-eval/mt_metrics_eval/stats.py:923: RuntimeWarning: invalid value encountered in sqrt
tden = np.sqrt(2 * (n - 1) / (n - 3) * k + rbar**2 * (1 - r23)**3)
F.......................
======================================================================
FAIL: testWMT23EnDeRatings (data_test.EvalSetTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/ivi/ilps/personal/stan1/reward/mt-metrics-eval/mt_metrics_eval/data_test.py", line 283, in testWMT23EnDeRatings
self.assertEqual(evs.human_rating_names, {'mqm.merged'})
AssertionError: Items in the first set but not the second:
'mqm.rater8'
'mqm.rater10'
'mqm.rater9'
'mqm.rater5'
'mqm.rater6'
'mqm.rater3'
'mqm.rater2'
'mqm.rater4'
'round2.mqm.merged'
'mqm.rater1'
'mqm.rater7'
'round3.mqm.merged'
======================================================================
FAIL: testWMT23ZhEnRatings (data_test.EvalSetTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/ivi/ilps/personal/stan1/reward/mt-metrics-eval/mt_metrics_eval/data_test.py", line 319, in testWMT23ZhEnRatings
self.assertEqual(
AssertionError: Items in the first set but not the second:
'mqm.merged'
'round3.mqm.merged'
'round2.mqm.merged'
======================================================================
FAIL: testSigDiffWithAvgAndNones (stats_test.WilliamsSigDiffTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/ivi/ilps/personal/stan1/reward/mt-metrics-eval/mt_metrics_eval/stats_test.py", line 527, in testSigDiffWithAvgAndNones
self.assertAlmostEqual(p, 0.121, places=3)
AssertionError: nan != 0.121 within 3 places (nan difference)
----------------------------------------------------------------------
Ran 84 tests in 108.139s
The text was updated successfully, but these errors were encountered:
Hi,
When running
python3 -m unittest discover mt_metrics_eval "*_test.py" # Takes ~70 seconds.
, i got the following errors.for testWMT23EnDeRatings and testWMT23ZhEnRatings: These tests are failing because there's a mismatch in the expected human rating names. The tests expect only 'mqm.merged', but the actual data contains individual rater scores ('mqm.rater1' through 'mqm.rater10') and additional merged scores ('round2.mqm.merged', 'round3.mqm.merged').
The text was updated successfully, but these errors were encountered: