Skip to content

Commit

Permalink
Rename to Stepwise supervision
Browse files Browse the repository at this point in the history
  • Loading branch information
qgallouedec committed Nov 15, 2024
1 parent b1d9f2d commit fe98b5b
Showing 1 changed file with 24 additions and 37 deletions.
61 changes: 24 additions & 37 deletions docs/source/dataset_formats.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,18 +78,13 @@ This guide provides an overview of the dataset formats and types supported by ea
</td>
</tr>
</tr>
<td>Stepwise / process supervision</td>
<td>Stepwise supervision</td>
<td>
<pre><code>{"prompt": "Which number is larger, 9.8 or 9.11?",
"completions": ["Let's compare their decimal values.\n", "The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.\n", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
"labels": [True, True, False]}</code></pre>
</td>
<td>
<pre><code>{"prompt": [{"role": "user", "content": "What color is the sky?"}],
"completions": [{"role": "assistant", "content": "Blue light scatters more in the atmosphere."},
{"role": "assistant", "content": "So it is blue."}],
"completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
"labels": [True, False]}</code></pre>
</td>
<td></td>
</tr>
</table>

Expand Down Expand Up @@ -240,22 +235,14 @@ unpaired_preference_example = {"prompt": [{"role": "user", "content": "What colo
"label": True}
```

#### Stepwise / process supervision
#### Stepwise supervision

A stepwise or process supervision dataset is similar to an [unpaired preference](#unpaired-preference) dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process.
A stepwise (or process) supervision dataset is similar to an [unpaired preference](#unpaired-preference) dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process.

```python
# Standard format
stepwise_example = {
"prompt": "Blue light",
"completions": [" scatters more in the atmosphere,", " so the sky is green."],
"labels": [True, False]
}
# Conversational format
stepwise_unpaired_preference_example = {
"prompt": [{"role": "user", "content": "What color is the sky?"}],
"completions": [{"role": "assistant", "content": "Blue light scatters more in the atmosphere."},
{"role": "assistant", "content": "So it is blue."}],
"prompt": "Which number is larger, 9.8 or 9.11?",
"completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
"labels": [True, False]
}
```
Expand Down Expand Up @@ -403,15 +390,15 @@ This section provides example code to help you convert between different dataset

For simplicity, some of the examples below do not follow this recommendation and use the standard format. However, the conversions can be applied directly to the conversational format without modification.

| From \ To | Language modeling | Prompt-completion | Prompt-only | Preference with implicit prompt | Preference | Unpaired preference | Stepwise unpaired preference |
| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------- | ---------------------------- |
| Language modeling | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| Prompt-completion | [🔗](#from-prompt-completion-to-language-modeling-dataset) | N/A | [🔗](#from-prompt-completion-to-prompt-only-dataset) | N/A | N/A | N/A | N/A |
| Prompt-only | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | N/A |
| Preference | [🔗](#from-preference-to-language-modeling-dataset) | [🔗](#from-preference-to-prompt-completion-dataset) | [🔗](#from-preference-to-prompt-only-dataset) | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A | [🔗](#from-preference-to-unpaired-preference-dataset) | N/A |
| Unpaired preference | [🔗](#from-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | N/A | N/A |
| Stepwise unpaired preference | [🔗](#from-stepwise-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-stepwise-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-stepwise-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | [🔗](#from-stepwise-unpaired-preference-to-unpaired-preference-dataset) | N/A |
| From \ To | Language modeling | Prompt-completion | Prompt-only | Preference with implicit prompt | Preference | Unpaired preference | Stepwise supervision |
| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------- | -------------------- |
| Language modeling | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| Prompt-completion | [🔗](#from-prompt-completion-to-language-modeling-dataset) | N/A | [🔗](#from-prompt-completion-to-prompt-only-dataset) | N/A | N/A | N/A | N/A |
| Prompt-only | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| Preference with implicit prompt | [🔗](#from-preference-with-implicit-prompt-to-language-modeling-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-completion-dataset) | [🔗](#from-preference-with-implicit-prompt-to-prompt-only-dataset) | N/A | [🔗](#from-implicit-to-explicit-prompt-preference-dataset) | [🔗](#from-preference-with-implicit-prompt-to-unpaired-preference-dataset) | N/A |
| Preference | [🔗](#from-preference-to-language-modeling-dataset) | [🔗](#from-preference-to-prompt-completion-dataset) | [🔗](#from-preference-to-prompt-only-dataset) | [🔗](#from-explicit-to-implicit-prompt-preference-dataset) | N/A | [🔗](#from-preference-to-unpaired-preference-dataset) | N/A |
| Unpaired preference | [🔗](#from-unpaired-preference-to-language-modeling-dataset) | [🔗](#from-unpaired-preference-to-prompt-completion-dataset) | [🔗](#from-unpaired-preference-to-prompt-only-dataset) | N/A | N/A | N/A | N/A |
| Stepwise supervision | [🔗](#from-stepwise-supervision-to-language-modeling-dataset) | [🔗](#from-stepwise-supervision-to-prompt-completion-dataset) | [🔗](#from-stepwise-supervision-to-prompt-only-dataset) | N/A | N/A | [🔗](#from-stepwise-supervision-to-unpaired-preference-dataset) | N/A |

### From prompt-completion to language modeling dataset

Expand Down Expand Up @@ -786,9 +773,9 @@ dataset = dataset.remove_columns(["completion", "label"])
{'prompt': 'The sky is'}
```

### From stepwise unpaired preference to language modeling dataset
### From stepwise supervision to language modeling dataset

To convert a stepwise unpaired preference dataset into a language modeling dataset, concatenate the prompt and the completions into the `"text"` column.
To convert a stepwise supervision dataset into a language modeling dataset, concatenate the prompt and the completions into the `"text"` column.

```python
from datasets import Dataset
Expand All @@ -812,9 +799,9 @@ dataset = dataset.map(concatenate_prompt_completions, remove_columns=["prompt",
{'text': 'Blue light scatters more in the atmosphere, so the sky is green.'}
```

### From stepwise unpaired preference to prompt completion dataset
### From stepwise supervision to prompt completion dataset

To convert a stepwise unpaired preference dataset into a prompt-completion dataset, join the completions and remove the labels.
To convert a stepwise supervision dataset into a prompt-completion dataset, join the completions and remove the labels.

```python
from datasets import Dataset
Expand All @@ -838,9 +825,9 @@ dataset = dataset.map(join_completions, remove_columns=["completions", "labels"]
{'prompt': 'Blue light', 'completion': ' scatters more in the atmosphere, so the sky is green.'}
```

### From stepwise unpaired preference to prompt only dataset
### From stepwise supervision to prompt only dataset

To convert a stepwise unpaired preference dataset into a prompt-only dataset, remove the completions and the labels.
To convert a stepwise supervision dataset into a prompt-only dataset, remove the completions and the labels.

```python
from datasets import Dataset
Expand All @@ -860,9 +847,9 @@ dataset = dataset.remove_columns(["completions", "labels"])
{'prompt': 'Blue light'}
```

### From stepwise unpaired preference to unpaired preference dataset
### From stepwise supervision to unpaired preference dataset

To convert a stepwise unpaired preference dataset into an unpaired preference dataset, join the completions and merge the labels.
To convert a stepwise supervision dataset into an unpaired preference dataset, join the completions and merge the labels.

The method for merging the labels depends on the specific task. In this example, we use the logical AND operation. This means that if the step labels indicate the correctness of individual steps, the resulting label will reflect the correctness of the entire sequence.

Expand Down

0 comments on commit fe98b5b

Please sign in to comment.