-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][Core] Support tensor parallel division with remainder of attention heads #5367
Closed
Closed
Changes from 15 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
b86675d
Change model config support unequal tp division
NadavShmayo a789569
Add unequal tp division util functions
NadavShmayo 428b85f
Change parallel layers to support unequal tp division
NadavShmayo c485d50
Add unequal tp division support for opt model
NadavShmayo 1cf543b
Add unequal tp division support for commandr model
NadavShmayo a6970c0
Add unequal tp division support for llama model
NadavShmayo 6a4b70e
Remove asserts in Llama and CommandR implementation
NadavShmayo 6b33c87
Add tp_rank to EmbeddingModelRunner class
NadavShmayo 90d9f6c
Fix QKVLinear to work with packed dim
NadavShmayo 014b682
Fix imports formatting in layer/linear.py file
NadavShmayo a30e120
Merge branch 'main' into unequal_tp_division
NadavShmayo 73c0159
Merge branch 'main' into unequal_tp_division
NadavShmayo cdb2e27
Remove unused variable
NadavShmayo b9e5309
Fix failing tests
NadavShmayo a268f20
Fix formatting
NadavShmayo b033a43
Add uneven tensor parallel test cases
NadavShmayo 34f9850
Fix review comments
NadavShmayo a154ade
Fix uneven TP tests and add to .buildkite
NadavShmayo fe906b5
Fix formatting and imports in new uneven TP tests
NadavShmayo 537e16b
Fix uneven TP chunked prefill tests and buildkit config
NadavShmayo 5639427
Change default padding size of ParallelLMHead to None
NadavShmayo 6f7c0de
Add validation for LoRA with tensor parallel
NadavShmayo b8e870a
Fix LLama uneven TP lm head
NadavShmayo fc777b5
Merge branch 'main' into unequal_tp_division
NadavShmayo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you separate this into its own variable as the remainder for clarity and/or please add a comment describing what is the intended behavior? The condition makes it a bit unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, this should use the util function I added for this logic, which should make it more readable.