Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mqa parallelization #51

Open
wants to merge 3 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

thomasw21
Copy link
Collaborator

@thomasw21 thomasw21 commented May 11, 2023

There's no reason to run a second communication for kv layers in MQA. Basically the previous ColumnLinear should already handle that all_gather for us, we just need to run a typical Linear layer. It does mean that we run that kv linear in duplicated fashion, but it's seems to not matter (It's my understanding that sequence parallel wasn't used during the training of starcoder)

The reason why I decided not to build another module, is for backward compatibility. Basically this PR should allow for free throughput improvement, and be able to load previous checkpoints.

I have not tested this PR. cc @loubnabnl

If it does provide the same improvement I've seen on my own codebase, then we need to propagate this PR to our inference models. Typically text-generation-inference and such. (EDIT: this seems already support in text-generation-inference with even an extra merging of qkv)

@thomasw21 thomasw21 requested a review from loubnabnl May 11, 2023 12:26
@lvwerra lvwerra requested a review from jlamypoirier May 11, 2023 12:30
Copy link
Collaborator Author

@thomasw21 thomasw21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay it seems the implementation is not correct for sequence parallel. I'll try to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant