RewardTrainer Producing Same Reward Score During and After Training #2265
Unanswered
deepakpandita57
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm currently working on a project using the TRL library for training a reward model with the RewardTrainer following the example from the trl repo. However, I've encountered an issue where the model consistently produces the same reward score during training and after evaluation. Here are some relevant details about my setup:
Model and Setup:
Observations:
Questions:
I understand you might need further details to point out the specific issue, I'm willing to provide the necessary details.
Any insights or suggestions would be greatly appreciated!
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions