-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GPTNeoX models #32
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+ There's still bugs in the attention dimensions mismatch
+ group batch attention is skipped to avoid this problem for now
+ flash attention only supports in fp16/bf16
…tion + cos/sin cache tensor is not trained parameter, so it's not autocast along with other model parameters through `torch_dtype`.
+ Works fine without the torch.cuda autocast context, so rollback.
Hi, Many thanks for your contribution. These commits are really helpful for this project. I have merged them in to the main branch! Regards, |
gianlucamacri
pushed a commit
to gianlucamacri/LongLoRA
that referenced
this pull request
Oct 31, 2023
Add support for GPTNeoX models
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds Long-LoRA support for GPTNeoX models.
Tested on a colab A100 40GB x 1 instance, with the scripts
fine-tune.py
supervised-fine-tune.py
Using a sample GPTNeoX model
EleutherAI/pythia-1.4b-deduped
As there was no specific guide on how to contribute, I've tried to make as little modification as possible to the original structure.
Added GPTNeoX support by adding a module
gptneox_attn_replace
just as the originalllama_attn_replace
.How to apply
Application is showcased in the tested scripts
fine-tune.py
,supervised-fine-tune.py
Add
model_type
argument to switch back and forth between thellama
andgpt-neox
configuration.Notes on flash-attention + GPTNeoX
modeling_gpt_neox.py
, for theuse_flash_attn=True
case.transformers == 4.33.3
as of writing.flash_attn_varlen_func
would cause a runtime error of "in-place operation" flash-attention codeflash_attn_varlen_qkvpacked_func
which worked fine.