Added parallel code for chatglm-6B #225

Caesar1993 · 2023-09-12T10:39:09Z

Added parallel code for chatglm-6B.
Due to the small number of parameters, the inference speed is not as fast as single card loading, but it can be referenced in GLM models with larger parameter quantities for inference.

Split the mixed qkv vectors in chatglm on the huggingface into multiple heads, then take out the qkv of each head, and finally concatenate them into a whole qkv
Write the layer definition of chatglm into init, and rebuild the forward function according to the basic layer in Colossalai

Michael Zhang and others added 10 commits August 9, 2023 18:29

updata

0523c3f

update

9ac2940

update

ab840cb

update

b45ca72

update

5caf465

update

9246a4e

update

5ccb849

update chatglm

45df2c8

Update README.md

a8ea37c

Update model_factory.py

6548508

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added parallel code for chatglm-6B #225

Added parallel code for chatglm-6B #225

Caesar1993 commented Sep 12, 2023 •

edited

Loading

Added parallel code for chatglm-6B #225

Are you sure you want to change the base?

Added parallel code for chatglm-6B #225

Conversation

Caesar1993 commented Sep 12, 2023 • edited Loading

Caesar1993 commented Sep 12, 2023 •

edited

Loading