Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding generic preference dataset builder #1623

Merged

Conversation

SalmanMohammadi
Copy link
Collaborator

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

I was writing the docs for the preference datasets but noticed we don't have a generic builder to point to, so, here we are.

Changelog

What are the changes made in this PR?

  • I've added a preference_dataset builder which uses ChosenToRejectedMessages. I don't think it's worth adding the stack exchange message format as an option at this time.
  • I've also exposed the HH RLHF dataset builder in our docs.

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

Copy link

pytorch-bot bot commented Sep 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1623

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit adb8934 with merge base c5db813 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2024
Copy link
Contributor

@RdoubleA RdoubleA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to also enable packing for preference datasets? should be easy to test and there's nothing blocking it

@@ -111,3 +112,46 @@ def test_get_item(self, mock_load_dataset, dialogue, expected):
prompt, label = ds[0]["rejected_input_ids"], ds[0]["rejected_labels"]
assert prompt == expected_rejected_tokens
assert label == expected_rejected_labels

def test_load_local_json(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, we hoenstly need more of these local dataset tests

"content": "What do I do when I have a hole in my trousers?",
"role": "user"
},
{ "content": "Take them off.", "role": "assistant" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫢 🫢 🫢

... split="train",
>>> )
>>> tokenizer.decode(dataset[0]["chosen_input_ids"], skip_special_tokens=True)
What do I do when I have a hole in my trousers?Fix the hole.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix this trim whitespace issue at some point...

Copy link
Collaborator Author

@SalmanMohammadi SalmanMohammadi Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually renders for me as 'user\n\nWhat do I do when I have a hole in my trousers?assistant\n\nTake them off.' with llama3 tokenizer and skip_special_tokens=True.

@SalmanMohammadi
Copy link
Collaborator Author

do you want to also enable packing for preference datasets? should be easy to test and there's nothing blocking it

I'm actually pretty out of the loop on packed datasets, sorry. TLDR how to?

@RdoubleA
Copy link
Contributor

RdoubleA commented Sep 19, 2024

I'm actually pretty out of the loop on packed datasets, sorry. TLDR how to?

just emulate the logic here and that's all you need, then expose the packed parameter:

then just run a DPO config with preference_dataset and use the override dataset.packed=True tokenizer.max_seq_len=4096 and see if it packs successfully and starts training with a few reasonable loss steps. You can hardcode num packs to make it pack faster

You could do this in a follow-up, but would be a nice boost to our RLHF recipes

@SalmanMohammadi
Copy link
Collaborator Author

I'm actually pretty out of the loop on packed datasets, sorry. TLDR how to?

just emulate the logic here and that's all you need, then expose the packed parameter:

then just run a DPO config with preference_dataset and use the override dataset.packed=True tokenizer.max_seq_len=4096 and see if it packs successfully and starts training with a few reasonable loss steps. You can hardcode num packs to make it pack faster

You could do this in a follow-up, but would be a nice boost to our RLHF recipes

Looks suspiciously straightforward - will follow up with this.

@SalmanMohammadi SalmanMohammadi merged commit cd573f9 into pytorch:main Sep 19, 2024
17 checks passed
@SalmanMohammadi SalmanMohammadi deleted the preference_dataset_builder branch September 19, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants