Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

Added parallel code for chatglm-6B #225

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Caesar1993
Copy link

@Caesar1993 Caesar1993 commented Sep 12, 2023

Added parallel code for chatglm-6B.
Due to the small number of parameters, the inference speed is not as fast as single card loading, but it can be referenced in GLM models with larger parameter quantities for inference.

  1. Split the mixed qkv vectors in chatglm on the huggingface into multiple heads, then take out the qkv of each head, and finally concatenate them into a whole qkv
  2. Write the layer definition of chatglm into init, and rebuild the forward function according to the basic layer in Colossalai

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant