Questions about training code for UltraRM/UltraCM #4

halfrot · 2023-10-18T12:47:02Z

Great Work! And thanks for the contribution. May I ask you if you have plans to release the training code for UltraRM/UltraCM?

Rosenberg37 · 2023-11-01T02:43:34Z

+1，Any plan for this issue?

lifan-yuan · 2023-12-01T08:16:29Z

Thanks for your interest!

For reward modeling, we use code in this repo: https://github.com/Dahoas/reward-modeling
For critique modeling, we use the code in our sister repo: https://github.com/thunlp/UltraChat

I also recommend HuggingFace TRL for easy implementation: https://huggingface.co/docs/trl/index

halfrot · 2023-12-01T15:56:19Z

Thank you!
And do you have plans to open-source the out-of-box code for training? I'm somehow interested in continuing fine-tuning UltraRM for a domain-specific dataset so the detailed training code might help that.

halfrot changed the title ~~Questions about training code for UltraRM~~ Questions about training code for UltraRM/UltraCM Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training code for UltraRM/UltraCM #4

Questions about training code for UltraRM/UltraCM #4

halfrot commented Oct 18, 2023

Rosenberg37 commented Nov 1, 2023

lifan-yuan commented Dec 1, 2023

halfrot commented Dec 1, 2023

Questions about training code for UltraRM/UltraCM #4

Questions about training code for UltraRM/UltraCM #4

Comments

halfrot commented Oct 18, 2023

Rosenberg37 commented Nov 1, 2023

lifan-yuan commented Dec 1, 2023

halfrot commented Dec 1, 2023