Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update custom eval loop to aid DPO debugging #770

Merged
merged 6 commits into from
Sep 14, 2023

Conversation

tomaarsen
Copy link
Member

@tomaarsen tomaarsen commented Sep 14, 2023

Hello!

This is intended to be pushed directly on top of dpo_custom_eval i.e. on top of #766, but I don't have the permissions for that.

Pull Request overview

  • sample_during_eval is now generate_during_eval - I think sample is a bit too vague.
  • return_tokens was unused, so I removed it.
  • Prevent test failures due to wandb import without having wandb as a mandatory dependency. I added import utils for W&B & a test.
  • Optimize random batch selection.
  • Separate prompt and Policy/Reference responses in game log table.

This PR is a WIP.

  • Tom Aarsen

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 14, 2023

The documentation is not available anymore as the PR was closed or merged.

@tomaarsen
Copy link
Member Author

Bad news @natolambert, the islice still seems to iterate over all elements until random_index.

image

@tomaarsen
Copy link
Member Author

That's easy to resolve though. The new approach takes ~0.005 seconds regardless of which index is used.

image

This also doesn't restrict us to batches anymore. We can just go for 1 sample now, for example.

Makes it much easier to quickly read the starts of the generations
@natolambert natolambert merged commit d53b982 into huggingface:dpo_custom_eval Sep 14, 2023
@tomaarsen tomaarsen deleted the dpo_custom_eval branch September 14, 2023 15:02
natolambert pushed a commit that referenced this pull request Sep 26, 2023
* init

* run

* Update custom eval loop to aid DPO debugging (#770)

* sample_during_eval -> generate_during_eval

* Remove unused return_tokens

* Add import utils for W&B, prevent test fails

* Optimize dataloader random batch selection

* Separate prompt and response in logs

Makes it much easier to quickly read the starts of the generations

* Simplify logging

* reset eval steps

* manual merge fixes

* revert merge

* remove self.max_length

* style

* fix max_length

---------

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* init

* run

* Update custom eval loop to aid DPO debugging (huggingface#770)

* sample_during_eval -> generate_during_eval

* Remove unused return_tokens

* Add import utils for W&B, prevent test fails

* Optimize dataloader random batch selection

* Separate prompt and response in logs

Makes it much easier to quickly read the starts of the generations

* Simplify logging

* reset eval steps

* manual merge fixes

* revert merge

* remove self.max_length

* style

* fix max_length

---------

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants