-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor KV cache, Rope , reduce common code #1148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also make similar changes for modeling_mistral.py, modeling_clip.py, modeling_mixtral.py (and other files, which might have Matmul or KVCache)
I think this PR can be affected by new PR in transformers: |
@abhilash1910 can you also consider the point made by @ulivne in #1160 (comment) It makes sense to have them here. |
Yes @yafshar will update this to include here, also seeing the #31999 - I guess it can added incrementally. |
@abhilash1910 would you please re-base the code. The CI is failing. Thanks |
@yafshar please help to trigger the CI |
@libinta would you please label this PR |
@abhilash1910 can you resolve conflict? |
@libinta done, please help to review. Thanks |
@abhilash1910 , Please run the CI test requested above and post results here in the pull request if you'd like this to be in 1.19 release, otherwise we can push it to 1.20. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I left the same comment everywhere to push the use of relative imports.
Also, please sync your branch with main and run make style
.
optimum/habana/transformers/models/gpt_neox/modeling_gpt_neox.py
Outdated
Show resolved
Hide resolved
optimum/habana/transformers/models/starcoder2/modeling_starcoder2.py
Outdated
Show resolved
Hide resolved
The code quality check failed, please run |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks @regisss for attending this and apologies could not update PR in time 👍🏻 |
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
- Correct the datatype during training used for rotary positional embedding calculation. - Remove unused function Note: The accuracy issue was identified to be caused by a previous pull request (huggingface#1148)
- Correct the datatype during training used for rotary positional embedding calculation. - Remove unused function Note: The accuracy issue was identified to be caused by a previous pull request (huggingface#1148)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
- Correct the datatype during training used for rotary positional embedding calculation. - Remove unused function. Note: The accuracy issue was identified to be caused by a previous pull request (huggingface#1148)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
What does this PR do?
Reduces common KVCache class code across generation models, refactors into modelling class.
cc @libinta @vidyasiv
Before submitting