Enable analyzing evaluators/annotators on data without multiple generator models #293

rdnfn · 2024-04-27T10:13:20Z

Currently, using alpaca_eval.main.analyze_evaluators evaluators/annotators can only be analyzed on data (like the original AlpacaEval dataset) that contains texts from more than one generator model. If a dataset only contains a single generator model, the computation of the (Spearman/Pearson) correlation between the winrates of these models under different annotators fails and throws an error. The computation fails because there are not enough values to correlate.

This PR makes the correlation computation optional: if the winrate correlation computation fails, np.nan values get returned instead and a warning log message is printed. The rest of the computed metrics get returned and no error is thrown. This allows analyzing evaluators on new kinds of data without multiple generator models. Other metrics, such as human agreement, can still be correctly computed in this case (correct me if I am wrong about this).

YannDubs · 2024-04-28T01:00:12Z

LGTM, thanks @rdnfn !

rdnfn added 6 commits April 17, 2024 17:17

update if statement to only apply if no generator column

7e38694

Merge branch 'tatsu-lab:main' into main

4f5c2a6

Merge branch 'tatsu-lab:main' into main

ecd86d5

add try/except statement to correlation computation

0402542

fix spelling

5243254

add more informative logging message if correlation computation fails

9eb4277

YannDubs merged commit c4a4ca7 into tatsu-lab:main Apr 28, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable analyzing evaluators/annotators on data without multiple generator models #293

Enable analyzing evaluators/annotators on data without multiple generator models #293

rdnfn commented Apr 27, 2024

YannDubs commented Apr 28, 2024

Enable analyzing evaluators/annotators on data without multiple generator models #293

Enable analyzing evaluators/annotators on data without multiple generator models #293

Conversation

rdnfn commented Apr 27, 2024

YannDubs commented Apr 28, 2024