Will this project support prm training of soft label? #57

Dada-Cloudzxy · 2024-11-13T09:20:03Z

OmegaPRM and Math-Shepherd both report that soft label is better?
OmegaPRM和Math-Shepherd好像都报告了soft label更好？

yitianlian · 2024-11-18T12:21:25Z

I'm also interested in this! From your paper about OpenR, I guess you will label + when the mc_value is larger than 0 (if I understand right), which means that this path can lead to a correct answer. But I don't think it's a nice idea, and also other work[1] uses regression to predict the reward.

Step-level Value Preference Optimization for Mathematical Reasoning

Dada-Cloudzxy changed the title ~~请问会支持 soft label 的 prm 训练吗？~~ Will this project support prm training of soft label? Nov 13, 2024

iamlilAJ added the enhancement New feature or request label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will this project support prm training of soft label? #57

Will this project support prm training of soft label? #57

Dada-Cloudzxy commented Nov 13, 2024 •

edited

Loading

yitianlian commented Nov 18, 2024

Will this project support prm training of soft label? #57

Will this project support prm training of soft label? #57

Comments

Dada-Cloudzxy commented Nov 13, 2024 • edited Loading

yitianlian commented Nov 18, 2024

Dada-Cloudzxy commented Nov 13, 2024 •

edited

Loading