Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will this project support prm training of soft label? #57

Open
Dada-Cloudzxy opened this issue Nov 13, 2024 · 1 comment
Open

Will this project support prm training of soft label? #57

Dada-Cloudzxy opened this issue Nov 13, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Dada-Cloudzxy
Copy link

Dada-Cloudzxy commented Nov 13, 2024

OmegaPRM and Math-Shepherd both report that soft label is better?
OmegaPRM和Math-Shepherd好像都报告了soft label更好?

@Dada-Cloudzxy Dada-Cloudzxy changed the title 请问会支持 soft label 的 prm 训练吗? Will this project support prm training of soft label? Nov 13, 2024
@iamlilAJ iamlilAJ added the enhancement New feature or request label Nov 13, 2024
@yitianlian
Copy link

I'm also interested in this! From your paper about OpenR, I guess you will label + when the mc_value is larger than 0 (if I understand right), which means that this path can lead to a correct answer. But I don't think it's a nice idea, and also other work[1] uses regression to predict the reward.

  1. Step-level Value Preference Optimization for Mathematical Reasoning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants