-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Chore] Remove unused batched RoPE op & kernel #24789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively removes the unused batched_rotary_embedding operation and its associated kernel. The changes are comprehensive, covering the C++ implementation, Python bindings, and corresponding tests, leading to a cleaner and more maintainable codebase. I have a few minor suggestions to improve code formatting for better readability and consistency with standard Python style guides.
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
IIUC, batched RoPE is only used when different LoRA adapters have different rope scaling factor.
Now that this feature is dropped in #21169, we can remove this op for simplication.