-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant transposes for rope rotation #807
Remove redundant transposes for rope rotation #807
Conversation
Pulling the latest commits from main fork
Pulling from the main repo
Pulling from mosaicml/llm-foundry main
Merging from mosaic main
Pulling from mosaic main
Pulling from mosaic main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we gate on HF version? Or is that the min version of HF that we install?
Yes, the minimum version of HF we install now is 4.36 ( Line 52 in 06b9a1f
|
@ShashankMosaicML Lets gate on transformers version please. Since this is in the model code, it'd be good to make it compatible with all (or as much as possible) transformers versions |
Done. |
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Now that we use transformers v4.36.0, we do not need to transpose query and keys back and forth for apply rope rotations to them. This PR fixes this.
We can see that the loss and mfu plots are almost identical
![Screenshot 2023-12-19 at 8 10 48 PM](https://private-user-images.githubusercontent.com/144760128/291777201-97112b48-ff23-4f7b-b894-26ab94c11e8e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjIwOTgsIm5iZiI6MTczOTc2MTc5OCwicGF0aCI6Ii8xNDQ3NjAxMjgvMjkxNzc3MjAxLTk3MTEyYjQ4LWZmMjMtNGY3Yi1iODk0LTI2YWI5NGMxMWU4ZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMzA5NThaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01Yjg3Y2YxZTVmYWFkMDNhM2UxYzZmODNiMGVhNDc3ODA1MDNkMWQ2ZmUxM2YyMzRiNTNiMWIwODUwN2Q4OGMzJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.bvxnLZqUCnUn0iQku9kwOYF7rm5cYFoQuXUKw0U-w8o)