🪜 Stepwise supervision dataset type #2148

qgallouedec · 2024-10-01T13:00:51Z

This PR aims to create a standard for stepwise data

Related: #2110

Chosen name: Stepwise supervision
Columns: "prompt", "completions", "labels"
Structure:

standard_example = {
    "prompt": "Beautiful is better than",
    "completions": [", let me think...", " ugly."],
    "labels": [False, True],
}

TODO

Update dataset format doc
Update internal testing dataset (zen)
Create a script for having our getting started script (from PRM800K?)

HuggingFaceDocBuilderDev · 2024-10-01T13:04:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/data_utils.py

lewtun · 2024-11-15T10:43:00Z

@qgallouedec I think this PR is mixing two concepts together:

Stepwise / process supervision: here one has a single prompt and N steps, where each step is assigned a score (not a preference label).
Multi-step preferences: here one has a prompt with a tree of (typically) paired preferences, similar to the MCTS DPO paper.

For stepwise supervision, I think the dataset structure can be quite simple:

{
    "prompt": "What number is larger, 9.8 or 9.11?",
    "completions": ["Step 1 ...", "Step 2 ...", ...],
    "labels": [0.6, 0.9, ...], # Can also be bools
}

WDYT about splitting this PR so that we target stepwise supervision first and include multistep preferences only if we end up extending the DPO trainer in that direction?

I'm happy to open a separate PR for stepwise supervision if you prefer

qgallouedec · 2024-11-15T11:52:03Z

The main intention was to implement stepwise process supervision. The doc isn't probably clear enough. Feel free to recommend/directly change anything that made you think it was multi-step preferences.
Related: #2148 (comment)

lewtun

Thanks for working on a standard format for stepwise supervision @qgallouedec ! The main comment I have is whether this should be grouped with preference learning or not and whether we should bother with the conversational format for now since I've never seen this used in practice (everything is single-turn thus far)

docs/source/dataset_formats.mdx

lewtun · 2024-11-15T12:58:23Z

docs/source/dataset_formats.mdx

+    "completions": [" scatters more in the atmosphere,", " so the sky is green."],
+    "labels": [True, False]
+}
+# Conversational format


As mentioned above, I don't think this format applies to process supervision

removed in fe98b5b

lewtun · 2024-11-15T12:59:38Z

docs/source/dataset_formats.mdx

-| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A                                                       | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) |
-| Preference                      | [🔗](#from-preference-to-language-modeling-dataset)                      | [🔗](#from-preference-to-prompt-completion-dataset)                      | [🔗](#from-preference-to-prompt-only-dataset)                      | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A                                                       | [🔗](#from-preference-to-unpaired-preference-dataset)                      |
-| Unpaired preference             | [🔗](#from-unpaired-preference-to-language-modeling-dataset)             | [🔗](#from-unpaired-preference-to-prompt-completion-dataset)             | [🔗](#from-unpaired-preference-to-prompt-only-dataset)             | N/A                                                       | N/A                                                       | N/A                                                                       |
+| From \ To                       | Language modeling                                                       | Prompt-completion                                                       | Prompt-only                                                       | Preference with implicit prompt                           | Preference                                                | Unpaired preference                                                       | Stepwise unpaired preference |


If you agree with my comment about not treating this as preference learning, then we might want to move this to it's own row for stepwise supervision

Done in fe98b5b

trl/data_utils.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

…o step-dataset

lewtun

Thanks for iterating @qgallouedec - this looks great now and the PRM dataset is a very useful starting point for the PRM PR @gaetanlop

standard step

07a9451

qgallouedec mentioned this pull request Oct 1, 2024

🐾 Process-supervised RM Trainer #2127

Merged

5 tasks

qgallouedec and others added 3 commits October 1, 2024 16:32

add merge same role

4da3368

Merge branch 'main' into step-dataset

3afde17

Merge branch 'main' into step-dataset

2ae2de3

qgallouedec marked this pull request as draft October 4, 2024 08:01

gaetanlop reviewed Oct 13, 2024

View reviewed changes

trl/data_utils.py Outdated Show resolved Hide resolved

gaetanlop reviewed Oct 13, 2024

View reviewed changes

trl/data_utils.py Outdated Show resolved Hide resolved

qgallouedec mentioned this pull request Oct 20, 2024

Correct masking when the same roles are present in adjacent messages in DataCollatorForCompletionOnlyLM #1994

Open

qgallouedec and others added 3 commits October 31, 2024 16:15

Merge branch 'main' into step-dataset

a8fe926

Add examples of conversational dataset formats

4d6f890

doc

61ef40b

Merge branch 'main' into step-dataset

478bbcd

lewtun reviewed Nov 15, 2024

View reviewed changes

qgallouedec and others added 11 commits November 15, 2024 14:50

Apply suggestions from code review

db9c3ae

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Apply suggestions from code review

8cbb11b

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update docs/source/dataset_formats.mdx

b0ad7c3

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Add prm800k dataset processing script

91c309f

Remove commented out validation and test URLs

8006998

Refactor conversational support for stepwise supervision

b1d9f2d

Rename to Stepwise supervision

fe98b5b

Fix typo in completions list

85b42a0

Merge branch 'main' into step-dataset

b86eb3b

Refactor dataset processing functions

dc1d042

Merge branch 'step-dataset' of https://github.com/huggingface/trl int…

677d355

…o step-dataset

qgallouedec changed the title ~~[Open discusion] Multistep dataset~~ 🪜 Stepwise supervision dataset type Nov 15, 2024

qgallouedec marked this pull request as ready for review November 15, 2024 17:26

Style

348d0a0

lewtun approved these changes Nov 18, 2024

View reviewed changes

qgallouedec merged commit 76dbb1a into main Nov 18, 2024
14 checks passed

qgallouedec deleted the step-dataset branch November 18, 2024 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🪜 Stepwise supervision dataset type #2148

🪜 Stepwise supervision dataset type #2148

qgallouedec commented Oct 1, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 1, 2024

lewtun commented Nov 15, 2024 •

edited

Loading

qgallouedec commented Nov 15, 2024 •

edited

Loading

lewtun left a comment

lewtun Nov 15, 2024

qgallouedec Nov 15, 2024

lewtun Nov 15, 2024

qgallouedec Nov 15, 2024

lewtun left a comment •

edited

Loading

🪜 Stepwise supervision dataset type #2148

🪜 Stepwise supervision dataset type #2148

Conversation

qgallouedec commented Oct 1, 2024 • edited Loading

TODO

HuggingFaceDocBuilderDev commented Oct 1, 2024

lewtun commented Nov 15, 2024 • edited Loading

qgallouedec commented Nov 15, 2024 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

lewtun Nov 15, 2024

Choose a reason for hiding this comment

qgallouedec Nov 15, 2024

Choose a reason for hiding this comment

lewtun Nov 15, 2024

Choose a reason for hiding this comment

qgallouedec Nov 15, 2024

Choose a reason for hiding this comment

lewtun left a comment • edited Loading

Choose a reason for hiding this comment

qgallouedec commented Oct 1, 2024 •

edited

Loading

lewtun commented Nov 15, 2024 •

edited

Loading

qgallouedec commented Nov 15, 2024 •

edited

Loading

lewtun left a comment •

edited

Loading