Rename to Stepwise supervision

huggingface · Nov 15, 2024 · fe98b5b · fe98b5b
1 parent b1d9f2d
commit fe98b5b
Showing 1 changed file with 24 additions and 37 deletions.
diff --git a/docs/source/dataset_formats.mdx b/docs/source/dataset_formats.mdx
@@ -78,18 +78,13 @@ This guide provides an overview of the dataset formats and types supported by ea
     </td>
   </tr>
   </tr>
-    <td>Stepwise / process supervision</td>
+    <td>Stepwise supervision</td>
     <td>
       <pre><code>{"prompt": "Which number is larger, 9.8 or 9.11?",
- "completions": ["Let's compare their decimal values.\n", "The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.\n", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
- "labels": [True, True, False]}</code></pre>
-    </td>
-    <td>
-      <pre><code>{"prompt": [{"role": "user", "content": "What color is the sky?"}],
- "completions": [{"role": "assistant", "content": "Blue light scatters more in the atmosphere."},
-                {"role": "assistant", "content": "So it is blue."}],
+ "completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
  "labels": [True, False]}</code></pre>
     </td>
+    <td></td>
   </tr>
 </table>
 
@@ -240,22 +235,14 @@ unpaired_preference_example = {"prompt": [{"role": "user", "content": "What colo
                                "label": True}
 ```
 
-#### Stepwise / process supervision
+#### Stepwise supervision
 
-A stepwise or process supervision dataset is similar to an [unpaired preference](#unpaired-preference) dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process.
+A stepwise (or process) supervision dataset is similar to an [unpaired preference](#unpaired-preference) dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process.
 
 ```python
-# Standard format
 stepwise_example = {
-    "prompt": "Blue light",
-    "completions": [" scatters more in the atmosphere,", " so the sky is green."],
-    "labels": [True, False]
-}
-# Conversational format
-stepwise_unpaired_preference_example = {
-    "prompt": [{"role": "user", "content": "What color is the sky?"}],
-    "completions": [{"role": "assistant", "content": "Blue light scatters more in the atmosphere."},
-                   {"role": "assistant", "content": "So it is blue."}],
+    "prompt": "Which number is larger, 9.8 or 9.11?",
+    "completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
     "labels": [True, False]
 }
 ```
@@ -403,15 +390,15 @@ This section provides example code to help you convert between different dataset
 
 For simplicity, some of the examples below do not follow this recommendation and use the standard format. However, the conversions can be applied directly to the conversational format without modification.
 
-| From \ To                       | Language modeling                                                       | Prompt-completion                                                       | Prompt-only                                                       | Preference with implicit prompt                           | Preference                                                | Unpaired preference                                                       | Stepwise unpaired preference |
-| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------- | ---------------------------- |
-| Language modeling               | N/A                                                                     | N/A                                                                     | N/A                                                               | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                          |
-| Prompt-completion               | [🔗](#from-prompt-completion-to-language-modeling-dataset)               | N/A                                                                     | [🔗](#from-prompt-completion-to-prompt-only-dataset)               | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                          |
-| Prompt-only                     | N/A                                                                     | N/A                                                                     | N/A                                                               | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                          |
-| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A                                                       | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | N/A                          |
-| Preference                      | [🔗](#from-preference-to-language-modeling-dataset)                      | [🔗](#from-preference-to-prompt-completion-dataset)                      | [🔗](#from-preference-to-prompt-only-dataset)                      | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A                                                       | [🔗](#from-preference-to-unpaired-preference-dataset)                      | N/A                          |
-| Unpaired preference             | [🔗](#from-unpaired-preference-to-language-modeling-dataset)             | [🔗](#from-unpaired-preference-to-prompt-completion-dataset)             | [🔗](#from-unpaired-preference-to-prompt-only-dataset)             | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                          |
-| Stepwise unpaired preference    | [🔗](#from-stepwise-unpaired-preference-to-language-modeling-dataset)    | [🔗](#from-stepwise-unpaired-preference-to-prompt-completion-dataset)    | [🔗](#from-stepwise-unpaired-preference-to-prompt-only-dataset)    | N/A                                                       | N/A                                                       | [🔗](#from-stepwise-unpaired-preference-to-unpaired-preference-dataset)    | N/A                          |
+| From \ To                       | Language modeling                                                       | Prompt-completion                                                       | Prompt-only                                                       | Preference with implicit prompt                           | Preference                                                | Unpaired preference                                                       | Stepwise supervision |
+| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------- | -------------------- |
+| Language modeling               | N/A                                                                     | N/A                                                                     | N/A                                                               | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                  |
+| Prompt-completion               | [🔗](#from-prompt-completion-to-language-modeling-dataset)               | N/A                                                                     | [🔗](#from-prompt-completion-to-prompt-only-dataset)               | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                  |
+| Prompt-only                     | N/A                                                                     | N/A                                                                     | N/A                                                               | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                  |
+| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A                                                       | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | N/A                  |
+| Preference                      | [🔗](#from-preference-to-language-modeling-dataset)                      | [🔗](#from-preference-to-prompt-completion-dataset)                      | [🔗](#from-preference-to-prompt-only-dataset)                      | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A                                                       | [🔗](#from-preference-to-unpaired-preference-dataset)                      | N/A                  |
+| Unpaired preference             | [🔗](#from-unpaired-preference-to-language-modeling-dataset)             | [🔗](#from-unpaired-preference-to-prompt-completion-dataset)             | [🔗](#from-unpaired-preference-to-prompt-only-dataset)             | N/A                                                       | N/A                                                       | N/A                                                                       | N/A                  |
+| Stepwise supervision            | [🔗](#from-stepwise-supervision-to-language-modeling-dataset)            | [🔗](#from-stepwise-supervision-to-prompt-completion-dataset)            | [🔗](#from-stepwise-supervision-to-prompt-only-dataset)            | N/A                                                       | N/A                                                       | [🔗](#from-stepwise-supervision-to-unpaired-preference-dataset)            | N/A                  |
 
 ### From prompt-completion to language modeling dataset
 
@@ -786,9 +773,9 @@ dataset = dataset.remove_columns(["completion", "label"])
 {'prompt': 'The sky is'}
 ```
 
-### From stepwise unpaired preference to language modeling dataset
+### From stepwise supervision to language modeling dataset
 
-To convert a stepwise unpaired preference dataset into a language modeling dataset, concatenate the prompt and the completions into the `"text"` column.
+To convert a stepwise supervision dataset into a language modeling dataset, concatenate the prompt and the completions into the `"text"` column.
 
 ```python
 from datasets import Dataset
@@ -812,9 +799,9 @@ dataset = dataset.map(concatenate_prompt_completions, remove_columns=["prompt",
 {'text': 'Blue light scatters more in the atmosphere, so the sky is green.'}
 ```
 
-### From stepwise unpaired preference to prompt completion dataset
+### From stepwise supervision to prompt completion dataset
 
-To convert a stepwise unpaired preference dataset into a prompt-completion dataset, join the completions and remove the labels.
+To convert a stepwise supervision dataset into a prompt-completion dataset, join the completions and remove the labels.
 
 ```python
 from datasets import Dataset
@@ -838,9 +825,9 @@ dataset = dataset.map(join_completions, remove_columns=["completions", "labels"]
 {'prompt': 'Blue light', 'completion': ' scatters more in the atmosphere, so the sky is green.'}
 ```
 
-### From stepwise unpaired preference to prompt only dataset
+### From stepwise supervision to prompt only dataset
 
-To convert a stepwise unpaired preference dataset into a prompt-only dataset, remove the completions and the labels.
+To convert a stepwise supervision dataset into a prompt-only dataset, remove the completions and the labels.
 
 ```python
 from datasets import Dataset
@@ -860,9 +847,9 @@ dataset = dataset.remove_columns(["completions", "labels"])
 {'prompt': 'Blue light'}
 ```
 
-### From stepwise unpaired preference to unpaired preference dataset
+### From stepwise supervision to unpaired preference dataset
 
-To convert a stepwise unpaired preference dataset into an unpaired preference dataset, join the completions and merge the labels.
+To convert a stepwise supervision dataset into an unpaired preference dataset, join the completions and merge the labels.
 
 The method for merging the labels depends on the specific task. In this example, we use the logical AND operation. This means that if the step labels indicate the correctness of individual steps, the resulting label will reflect the correctness of the entire sequence.