Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More complex reward mechanism #419

Closed
davide221 opened this issue Jun 8, 2023 · 3 comments
Closed

More complex reward mechanism #419

davide221 opened this issue Jun 8, 2023 · 3 comments

Comments

@davide221
Copy link

I'm experimenting with chain of thoughts and finetuning, is there a way to use this library to assign different reward for each steps towards the problem resolution of the LLM? Like the new technique proposed in the step by step paper from OpenAI.

Would be amazing to experiment with it

@lvwerra
Copy link
Member

lvwerra commented Jun 19, 2023

Currently only a reward can be given to the last token and it discounted towards the beginning. The KL-div however is added per token and you could adapt the compute_rewards method to accept lists of scores for each sample.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Jun 20, 2023

This could be related to #424 and #429, which allow users to inject rewards mid-trajectory.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@lvwerra lvwerra closed this as completed Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants