-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset v2 discussion & feedback #88
Comments
Idea: Now that we have a bunch of RMs, we can see if there are any datapoints that the models all think are wrong and double check our labels for future releases. |
Datapoints don't have some kind of id do they? That would make it easier to refer to them. Bad/Bad pairsI think the main difficulty with data is pairs that are differently bad, rather than good/bad. AdversarialOften the rejected is subtly (but factually/instruction following) bad, while the chosen one is blatantly bad.
(I can not think of a scenario where a user would be happier with the first answer than the second here)
(would you really prefer a conversational assistant app gives the chosen over the rejected answer here?)
(yes I know it has a non-prime in the rejected. but the chosen example is also very bad) RefusalsRefusals contain pairs that are definitely not clear. Also the refusal is often very poorly phrased.
Truncated chosen responsesHow bad is a truncated response? Well if your model ranks prefixes, not so much. If it ranks complete responses, quite a lot. There seem to be a significant number of truncated chosen responses.
Arguable or incorrect safety pairsThe donotanswer set appears particularly bad in assuming certain topics are something that can't be talked about, and should just be refused with a single message. More broadly the safety samples are often very arguable.
likewise
|
@sanderland they do have ID's they're just dropped when inference is run. Integers increasing from 1 corresponding to the pre-filtering split (so the final split has some skipped numbers) https://huggingface.co/datasets/allenai/reward-bench, but I can handle these |
@sanderland I made a preview dataset for new versions that removes most of the errors you mention. We'll be exploring this further in the near future! https://huggingface.co/datasets/allenai/reward-bench-cleaned-preview |
Hey! Post any questions or complaints on the dataset. We'll log our internal goals and limitations here too.
The text was updated successfully, but these errors were encountered: