You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The team of the HELM paper just shared a data set of doc-summary faithfulness ratings in this issue.
The rating is binary and was crowd sourced. The rated docs are from cnn and xsum. The summaries are references or created by some recent models (gpt3 etc). I think this could be integrated into aggrefact to get an even bigger and better benchmark.
I would be interested in discussing opinions whether this is a fit to be integrated into aggrefact and what to consider while doing so.
The text was updated successfully, but these errors were encountered:
The team of the HELM paper just shared a data set of doc-summary faithfulness ratings in this issue.
The rating is binary and was crowd sourced. The rated docs are from cnn and xsum. The summaries are references or created by some recent models (gpt3 etc). I think this could be integrated into aggrefact to get an even bigger and better benchmark.
I would be interested in discussing opinions whether this is a fit to be integrated into aggrefact and what to consider while doing so.
The text was updated successfully, but these errors were encountered: