You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to finetune using ORPOTrainer.
I have a question that if I have 1 chosen answer and 10 rejected answers and my CONTEXT LENGTH for chosen answer is 8192 then does it increase the prompt length for rejected answers by 10 times?
How does that work in the backend in terms of context length.
As you have to create user, assistant pairs for all rejected answers.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am trying to finetune using ORPOTrainer.
I have a question that if I have 1 chosen answer and 10 rejected answers and my CONTEXT LENGTH for chosen answer is 8192 then does it increase the prompt length for rejected answers by 10 times?
How does that work in the backend in terms of context length.
As you have to create user, assistant pairs for all rejected answers.
Eg: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k/viewer/default/train?p=1
Beta Was this translation helpful? Give feedback.
All reactions