-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🪜 Stepwise supervision dataset type #2148
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@qgallouedec I think this PR is mixing two concepts together:
For stepwise supervision, I think the dataset structure can be quite simple:
WDYT about splitting this PR so that we target stepwise supervision first and include multistep preferences only if we end up extending the DPO trainer in that direction? I'm happy to open a separate PR for stepwise supervision if you prefer |
The main intention was to implement stepwise process supervision. The doc isn't probably clear enough. Feel free to recommend/directly change anything that made you think it was multi-step preferences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on a standard format for stepwise supervision @qgallouedec ! The main comment I have is whether this should be grouped with preference learning or not and whether we should bother with the conversational format for now since I've never seen this used in practice (everything is single-turn thus far)
docs/source/dataset_formats.mdx
Outdated
"completions": [" scatters more in the atmosphere,", " so the sky is green."], | ||
"labels": [True, False] | ||
} | ||
# Conversational format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I don't think this format applies to process supervision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in fe98b5b
docs/source/dataset_formats.mdx
Outdated
| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | | ||
| Preference | [🔗](#from-preference-to-language-modeling-dataset) | [🔗](#from-preference-to-prompt-completion-dataset) | [🔗](#from-preference-to-prompt-only-dataset) | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A | [🔗](#from-preference-to-unpaired-preference-dataset) | | ||
| Unpaired preference | [🔗](#from-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | N/A | | ||
| From \ To | Language modeling | Prompt-completion | Prompt-only | Preference with implicit prompt | Preference | Unpaired preference | Stepwise unpaired preference | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree with my comment about not treating this as preference learning, then we might want to move this to it's own row for stepwise supervision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in fe98b5b
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @qgallouedec - this looks great now and the PRM dataset is a very useful starting point for the PRM PR @gaetanlop
This PR aims to create a standard for stepwise data
Related: #2110
Chosen name: Stepwise supervision
Columns:
"prompt"
,"completions"
,"labels"
Structure:
TODO