Integrating human faithfulness ratings from HELM paper. #4

UntotaufUrlaub · 2023-07-02T16:20:18Z

The team of the HELM paper just shared a data set of doc-summary faithfulness ratings in this issue.
The rating is binary and was crowd sourced. The rated docs are from cnn and xsum. The summaries are references or created by some recent models (gpt3 etc). I think this could be integrated into aggrefact to get an even bigger and better benchmark.

I would be interested in discussing opinions whether this is a fit to be integrated into aggrefact and what to consider while doing so.

Liyan06 · 2023-07-02T17:52:40Z

Happy to discuss this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating human faithfulness ratings from HELM paper. #4

Integrating human faithfulness ratings from HELM paper. #4

UntotaufUrlaub commented Jul 2, 2023

Liyan06 commented Jul 2, 2023

Integrating human faithfulness ratings from HELM paper. #4

Integrating human faithfulness ratings from HELM paper. #4

Comments

UntotaufUrlaub commented Jul 2, 2023

Liyan06 commented Jul 2, 2023