Some question regarding using partial answer as training objective #25

chanchimin · 2024-12-15T10:18:14Z

Hello, thank you for your great work! And I have some question about the step-dpo, the dataset on hf ("xinlai/Math-Step-DPO-10K") seems like taking "prompt" as input, and use "chosen" and "rejected" during training ("full_chosen" and "full_rejected" is assumingly not used. ), then under this circumstanding, won't the model tend to generate the partial response during inference? I am not sure if I am understanding right here, feel free to correct me, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question regarding using partial answer as training objective #25

Some question regarding using partial answer as training objective #25

chanchimin commented Dec 15, 2024 •

edited

Loading

Some question regarding using partial answer as training objective #25

Some question regarding using partial answer as training objective #25

Comments

chanchimin commented Dec 15, 2024 • edited Loading

chanchimin commented Dec 15, 2024 •

edited

Loading