More complex reward mechanism #419

davide221 · 2023-06-08T15:08:02Z

I'm experimenting with chain of thoughts and finetuning, is there a way to use this library to assign different reward for each steps towards the problem resolution of the LLM? Like the new technique proposed in the step by step paper from OpenAI.

Would be amazing to experiment with it

lvwerra · 2023-06-19T13:49:40Z

Currently only a reward can be given to the last token and it discounted towards the beginning. The KL-div however is added per token and you could adapt the compute_rewards method to accept lists of scores for each sample.

vwxyzjn · 2023-06-20T12:41:20Z

This could be related to #424 and #429, which allow users to inject rewards mid-trajectory.

github-actions · 2023-07-14T15:04:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

lvwerra closed this as completed Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More complex reward mechanism #419

More complex reward mechanism #419

davide221 commented Jun 8, 2023

lvwerra commented Jun 19, 2023

vwxyzjn commented Jun 20, 2023

github-actions bot commented Jul 14, 2023

More complex reward mechanism #419

More complex reward mechanism #419

Comments

davide221 commented Jun 8, 2023

lvwerra commented Jun 19, 2023

vwxyzjn commented Jun 20, 2023

github-actions bot commented Jul 14, 2023