Torch compilable RoPE #51
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
torch.compile() seems to fails on the model because the current RoPE implementation re-accesses the position tensor.
forward() after compile() gives the following error
Rewriting the
get_rotary_embeddingcall to use unsqueeze instead of a view makes the model compilable and results in a 19% speedup on forward() calls for an input of 3K prompt, 512 generation after warmup runs.There's one potential issue w/ this implementation however, torch.allclose() between the sin and cos tensors will fail if the RoPE cache for the current implementation is enabled but passes if cache is disabled. The deviation between this implementation and the current version is ~1e-4 atol and affects 2% of the encodings.
I'm unsure where the deviation is coming from.