-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove prims.embedding and prims.embedding_backward #1689
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
This is interesting. Isn't this regressing support for calling |
if max_norm is not None: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max_norm
argument was not supported. I don't know the reason for this. I think it should work and I'll update OpInfo samples to test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_norm
argument when active modifies the weight
argument inplace. That's probably the reason this argument raised NotImplementedError. I'll keep the behavior as is in this PR.
Created an issue for tracking completion #1699.
Pull request was converted to draft
A wait then we might want to change this too lightning-thunder/thunder/executors/nvfuserex_impl.py Lines 2636 to 2637 in 52ee541
cc. @Priya2698 for #1674 |
@mruberry, could you please take another look? |
RMSNorm failure looks flaky:
The absolute difference is quite large. fyi @kevinstephano -- I think you're looking at RMSNorm now? |
utils.check(weight.ndim == 2, lambda: f"Expected weight (weight.shape={weight.shape} to be a matrix)") | ||
shape = list(a.shape) | ||
shape.append(weight.shape[1]) | ||
return TensorProxy(like=weight, shape=shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe want to add a comment for the direct construction of a tensorproxy here instead of calling an operator that would create one -- I'm curious, too
result = prims.embedding_backward(grad, indices, num_weights, padding_idx, scale_grad_by_freq, sparse) | ||
return result | ||
shape = (num_weights, grad.shape[-1]) | ||
return TensorProxy(shape=shape, device=grad.device, dtype=grad.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here about directly constructing and return a TensorProxy over calling some op that would create one
I guess I'm a little confused about operations in the torch namespace directly constructing TensorProxies, as that seems more like a primitives thing? Curious to hear your thoughts
We don't need
prims.embedding
andprims.embedding_backward
because we can simply usethunder.torch.embedding
andthunder.torch.embedding_backward
as our transformation targets. When we added these primitives we didn't have a good idea how multi-level symbols can be targeted.With this PR
thunder.torch.embedding
is transformed directly totorch.nn.functional.embedding
for execution. While before this PRthunder.torch.embedding
is decomposed intoprims.embedding
which is then transformed intotorch.nn.functional.embedding
(and similarly forembedding_backward
).Testing: