From fe98b5b71328e20731bcb922566a4b724137aa17 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= Date: Fri, 15 Nov 2024 17:14:41 +0000 Subject: [PATCH] Rename to Stepwise supervision --- docs/source/dataset_formats.mdx | 61 +++++++++++++-------------------- 1 file changed, 24 insertions(+), 37 deletions(-) diff --git a/docs/source/dataset_formats.mdx b/docs/source/dataset_formats.mdx index d52f5768f8..3b4ab5ac5c 100644 --- a/docs/source/dataset_formats.mdx +++ b/docs/source/dataset_formats.mdx @@ -78,18 +78,13 @@ This guide provides an overview of the dataset formats and types supported by ea - Stepwise / process supervision + Stepwise supervision
{"prompt": "Which number is larger, 9.8 or 9.11?",
- "completions": ["Let's compare their decimal values.\n", "The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.\n", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
- "labels": [True, True, False]}
- - -
{"prompt": [{"role": "user", "content": "What color is the sky?"}],
- "completions": [{"role": "assistant", "content": "Blue light scatters more in the atmosphere."},
-                {"role": "assistant", "content": "So it is blue."}],
+ "completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
  "labels": [True, False]}
+ @@ -240,22 +235,14 @@ unpaired_preference_example = {"prompt": [{"role": "user", "content": "What colo "label": True} ``` -#### Stepwise / process supervision +#### Stepwise supervision -A stepwise or process supervision dataset is similar to an [unpaired preference](#unpaired-preference) dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process. +A stepwise (or process) supervision dataset is similar to an [unpaired preference](#unpaired-preference) dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process. ```python -# Standard format stepwise_example = { - "prompt": "Blue light", - "completions": [" scatters more in the atmosphere,", " so the sky is green."], - "labels": [True, False] -} -# Conversational format -stepwise_unpaired_preference_example = { - "prompt": [{"role": "user", "content": "What color is the sky?"}], - "completions": [{"role": "assistant", "content": "Blue light scatters more in the atmosphere."}, - {"role": "assistant", "content": "So it is blue."}], + "prompt": "Which number is larger, 9.8 or 9.11?", + "completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."], "labels": [True, False] } ``` @@ -403,15 +390,15 @@ This section provides example code to help you convert between different dataset For simplicity, some of the examples below do not follow this recommendation and use the standard format. However, the conversions can be applied directly to the conversational format without modification. -| From \ To | Language modeling | Prompt-completion | Prompt-only | Preference with implicit prompt | Preference | Unpaired preference | Stepwise unpaired preference | -| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------- | ---------------------------- | -| Language modeling | N/A | N/A | N/A | N/A | N/A | N/A | N/A | -| Prompt-completion | [🔗](#from-prompt-completion-to-language-modeling-dataset) | N/A | [🔗](#from-prompt-completion-to-prompt-only-dataset) | N/A | N/A | N/A | N/A | -| Prompt-only | N/A | N/A | N/A | N/A | N/A | N/A | N/A | -| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | N/A | -| Preference | [🔗](#from-preference-to-language-modeling-dataset) | [🔗](#from-preference-to-prompt-completion-dataset) | [🔗](#from-preference-to-prompt-only-dataset) | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A | [🔗](#from-preference-to-unpaired-preference-dataset) | N/A | -| Unpaired preference | [🔗](#from-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | N/A | N/A | -| Stepwise unpaired preference | [🔗](#from-stepwise-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-stepwise-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-stepwise-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | [🔗](#from-stepwise-unpaired-preference-to-unpaired-preference-dataset) | N/A | +| From \ To | Language modeling | Prompt-completion | Prompt-only | Preference with implicit prompt | Preference | Unpaired preference | Stepwise supervision | +| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------- | -------------------- | +| Language modeling | N/A | N/A | N/A | N/A | N/A | N/A | N/A | +| Prompt-completion | [🔗](#from-prompt-completion-to-language-modeling-dataset) | N/A | [🔗](#from-prompt-completion-to-prompt-only-dataset) | N/A | N/A | N/A | N/A | +| Prompt-only | N/A | N/A | N/A | N/A | N/A | N/A | N/A | +| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | N/A | +| Preference | [🔗](#from-preference-to-language-modeling-dataset) | [🔗](#from-preference-to-prompt-completion-dataset) | [🔗](#from-preference-to-prompt-only-dataset) | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A | [🔗](#from-preference-to-unpaired-preference-dataset) | N/A | +| Unpaired preference | [🔗](#from-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | N/A | N/A | +| Stepwise supervision | [🔗](#from-stepwise-supervision-to-language-modeling-dataset) | [🔗](#from-stepwise-supervision-to-prompt-completion-dataset) | [🔗](#from-stepwise-supervision-to-prompt-only-dataset) | N/A | N/A | [🔗](#from-stepwise-supervision-to-unpaired-preference-dataset) | N/A | ### From prompt-completion to language modeling dataset @@ -786,9 +773,9 @@ dataset = dataset.remove_columns(["completion", "label"]) {'prompt': 'The sky is'} ``` -### From stepwise unpaired preference to language modeling dataset +### From stepwise supervision to language modeling dataset -To convert a stepwise unpaired preference dataset into a language modeling dataset, concatenate the prompt and the completions into the `"text"` column. +To convert a stepwise supervision dataset into a language modeling dataset, concatenate the prompt and the completions into the `"text"` column. ```python from datasets import Dataset @@ -812,9 +799,9 @@ dataset = dataset.map(concatenate_prompt_completions, remove_columns=["prompt", {'text': 'Blue light scatters more in the atmosphere, so the sky is green.'} ``` -### From stepwise unpaired preference to prompt completion dataset +### From stepwise supervision to prompt completion dataset -To convert a stepwise unpaired preference dataset into a prompt-completion dataset, join the completions and remove the labels. +To convert a stepwise supervision dataset into a prompt-completion dataset, join the completions and remove the labels. ```python from datasets import Dataset @@ -838,9 +825,9 @@ dataset = dataset.map(join_completions, remove_columns=["completions", "labels"] {'prompt': 'Blue light', 'completion': ' scatters more in the atmosphere, so the sky is green.'} ``` -### From stepwise unpaired preference to prompt only dataset +### From stepwise supervision to prompt only dataset -To convert a stepwise unpaired preference dataset into a prompt-only dataset, remove the completions and the labels. +To convert a stepwise supervision dataset into a prompt-only dataset, remove the completions and the labels. ```python from datasets import Dataset @@ -860,9 +847,9 @@ dataset = dataset.remove_columns(["completions", "labels"]) {'prompt': 'Blue light'} ``` -### From stepwise unpaired preference to unpaired preference dataset +### From stepwise supervision to unpaired preference dataset -To convert a stepwise unpaired preference dataset into an unpaired preference dataset, join the completions and merge the labels. +To convert a stepwise supervision dataset into an unpaired preference dataset, join the completions and merge the labels. The method for merging the labels depends on the specific task. In this example, we use the logical AND operation. This means that if the step labels indicate the correctness of individual steps, the resulting label will reflect the correctness of the entire sequence.