-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299
Merged
ShashankMosaicML
merged 86 commits into
mosaicml:main
from
ShashankMosaicML:mixed_attention_modules
Jun 30, 2024
Merged
Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299
ShashankMosaicML
merged 86 commits into
mosaicml:main
from
ShashankMosaicML:mixed_attention_modules
Jun 30, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ShashankMosaicML
changed the title
[WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like RNN, sliding window etc.
[WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc.
Jun 22, 2024
ShashankMosaicML
changed the title
[WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc.
Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc.
Jun 22, 2024
…/llm-foundry into mixed_attention_modules
dakinggg
reviewed
Jun 29, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm, are you done testing it?
Yes, we have finished testing this. Everything seems fine. |
dakinggg
approved these changes
Jun 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This allows for overriding default block configs for certain layers. This should contain two sub configs: order and overrides. order specifies the order of different kinds of layers (default refers to a layer that does not apply any overrides). For each kind of layer, specify the overrides in the overrides config. For example, to specify this model (https://research.character.ai/optimizing-inference/) , the following config will be needed:
Also prints the following log summarizing the network:
Note that the table above prints the absolute layer index for
reuse_kv_layer_idx