Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Disscussion] Can we align GLM-130B to human like chatgpt? #43

Open
AnShengqiang opened this issue Dec 10, 2022 · 7 comments
Open

[Disscussion] Can we align GLM-130B to human like chatgpt? #43

AnShengqiang opened this issue Dec 10, 2022 · 7 comments

Comments

@AnShengqiang
Copy link

No description provided.

@AnShengqiang AnShengqiang changed the title Can we align GLM-130B to human like chatgpt? [Disscussion] Can we align GLM-130B to human like chatgpt? Dec 11, 2022
@Xiao9905
Copy link
Member

Certainly. The alignment for GLM-130B could be important, and we are on preliminary surveying.

@conceptofmind
Copy link

You could use the current glm-10b on huggingface with trl/trlx to construct a model with rlhf.

@smeyerhot
Copy link

What is trl/trlx? I am very interested in this use case. Why must the 10-b parameter model be used for rlhf?

@smeyerhot
Copy link

I am actively working on this task and would be very interested in further development coordination.

@conceptofmind
Copy link

@smeyerhot Trl is Transformer Reinforcement Learning a library built by Huggingface for training language models with PPO. Trlx is an extension of Trl built by CarperAI. Both cover the same use-case for training models using reinforcement learning with human feedback. You can also build the same functionality with actor-critic ppo in PyTorch although it would require more extensive domain knowledge. You do not have to use glm-10b but it is publicly available on Huggingface's model hub unlike 130b which requires you to apply for access. You can use any encoder-decoder or decoder-only model. We are on an issue relating to GLM for aligning human feedback with the model which is why I suggested using the 10b parameter one.

@Syno8
Copy link

Syno8 commented Feb 23, 2023

chatgpt can generate the format text and image. this need to keep the pertaining data in original format

@beautifull4frank
Copy link

hi gays, I use bloom to implement ppo successfully.
image
But I found the Bloom model use the AutoModelForCausalLM function.
image

however, the glm is using the AutoModelForSeq2SeqLM function.
image
there is no LM in AutoModelForSeq2SeqLM model, so do u know how to correct ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants