-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Disscussion] Can we align GLM-130B to human like chatgpt? #43
Comments
Certainly. The alignment for GLM-130B could be important, and we are on preliminary surveying. |
You could use the current glm-10b on huggingface with trl/trlx to construct a model with rlhf. |
What is trl/trlx? I am very interested in this use case. Why must the 10-b parameter model be used for rlhf? |
I am actively working on this task and would be very interested in further development coordination. |
@smeyerhot Trl is Transformer Reinforcement Learning a library built by Huggingface for training language models with PPO. Trlx is an extension of Trl built by CarperAI. Both cover the same use-case for training models using reinforcement learning with human feedback. You can also build the same functionality with actor-critic ppo in PyTorch although it would require more extensive domain knowledge. You do not have to use glm-10b but it is publicly available on Huggingface's model hub unlike 130b which requires you to apply for access. You can use any encoder-decoder or decoder-only model. We are on an issue relating to GLM for aligning human feedback with the model which is why I suggested using the 10b parameter one. |
chatgpt can generate the format text and image. this need to keep the pertaining data in original format |
No description provided.
The text was updated successfully, but these errors were encountered: