Remove prims.embedding and prims.embedding_backward #1689

IvanYashchuk · 2025-01-24T16:26:58Z

We don't need prims.embedding and prims.embedding_backward because we can simply use thunder.torch.embedding and thunder.torch.embedding_backward as our transformation targets. When we added these primitives we didn't have a good idea how multi-level symbols can be targeted.

With this PR thunder.torch.embedding is transformed directly to torch.nn.functional.embedding for execution. While before this PR thunder.torch.embedding is decomposed into prims.embedding which is then transformed into torch.nn.functional.embedding (and similarly for embedding_backward).

Testing:

pytest thunder/tests/test_grad.py -k "embedding" -vvv
pytest thunder/tests/test_ops.py -k "embedding" -vvv

for more information, see https://pre-commit.ci

mruberry · 2025-01-24T16:39:02Z

This is interesting.

Isn't this regressing support for calling torch.embedding with the max_norm argument, though?

IvanYashchuk · 2025-01-24T16:48:28Z

thunder/core/prims.py

-    if max_norm is not None:
-        raise NotImplementedError


The max_norm argument was not supported. I don't know the reason for this. I think it should work and I'll update OpInfo samples to test it.

max_norm argument when active modifies the weight argument inplace. That's probably the reason this argument raised NotImplementedError. I'll keep the behavior as is in this PR.
Created an issue for tracking completion #1699.

riccardofelluga · 2025-01-24T17:55:04Z

A wait then we might want to change this too

lightning-thunder/thunder/executors/nvfuserex_impl.py

Lines 2636 to 2637 in 52ee541

    
           register_supported(PrimIDs.EMBEDDING, embedding, _embedding_check) 
        
           register_supported(ltorch.embedding, embedding, _embedding_check)

cc. @Priya2698 for #1674

thunder/executors/nvfuserex_impl.py

…ew outputs

thunder/torch/__init__.py

IvanYashchuk · 2025-02-05T15:39:50Z

@mruberry, could you please take another look?

mruberry · 2025-02-05T16:59:00Z

RMSNorm failure looks flaky:

2025-02-05T15:48:31.4628903Z �[31mFAILED�[0m thunder/tests/test_ops.py::�[1mtest_core_vs_torch_consistency_rms_norm_nvfuser_cuda_thunder.dtypes.float16�[0m - AssertionError: Tensor-likes are not close!
2025-02-05T15:48:31.4629274Z 
2025-02-05T15:48:31.4629647Z Mismatched elements: 1 / 1 (100.0%)
2025-02-05T15:48:31.4630080Z Greatest absolute difference: 0.203125 at index (0,) (up to 0.01 allowed)
2025-02-05T15:48:31.4630577Z Greatest relative difference: 0.0252685546875 at index (0,) (up to 0.01 allowed)

The absolute difference is quite large. fyi @kevinstephano -- I think you're looking at RMSNorm now?

mruberry · 2025-02-05T17:00:52Z

thunder/torch/__init__.py

+        utils.check(weight.ndim == 2, lambda: f"Expected weight (weight.shape={weight.shape} to be a matrix)")
+        shape = list(a.shape)
+        shape.append(weight.shape[1])
+        return TensorProxy(like=weight, shape=shape)


Maybe want to add a comment for the direct construction of a tensorproxy here instead of calling an operator that would create one -- I'm curious, too

mruberry · 2025-02-05T17:01:57Z

thunder/torch/__init__.py

-    result = prims.embedding_backward(grad, indices, num_weights, padding_idx, scale_grad_by_freq, sparse)
-    return result
+    shape = (num_weights, grad.shape[-1])
+    return TensorProxy(shape=shape, device=grad.device, dtype=grad.dtype)


Same question here about directly constructing and return a TensorProxy over calling some op that would create one

I guess I'm a little confused about operations in the torch namespace directly constructing TensorProxies, as that seems more like a primitives thing? Curious to hear your thoughts

Remove prims.embedding and prims.embedding_backward

f77c202

IvanYashchuk added the operators label Jan 24, 2025

IvanYashchuk requested review from mruberry, lantiga and t-vi as code owners January 24, 2025 16:26

[pre-commit.ci] auto fixes from pre-commit.com hooks

979f872

for more information, see https://pre-commit.ci

IvanYashchuk enabled auto-merge (squash) January 24, 2025 16:35

IvanYashchuk commented Jan 24, 2025

View reviewed changes

IvanYashchuk marked this pull request as draft January 24, 2025 16:49

auto-merge was automatically disabled January 24, 2025 16:49
Pull request was converted to draft

IvanYashchuk commented Jan 24, 2025

View reviewed changes

thunder/executors/nvfuserex_impl.py Show resolved Hide resolved

IvanYashchuk added 2 commits January 27, 2025 15:08

Fix KeyError in test_torch_compile_litgpt

97ca50b

Fix _remove_noop_subsymbols to remove subsymbols that do not create n…

8688844

…ew outputs

mruberry reviewed Jan 27, 2025

View reviewed changes

thunder/torch/__init__.py Show resolved Hide resolved

IvanYashchuk added 3 commits January 27, 2025 20:25

Don't remove subsymbols that are tagged not to be dce'd

ec0fe4f

Add a link to the issue about max_norm arg for embedding

1d76408

Merge remote-tracking branch 'upstream/main' into remove-prims-embedding

5e52ea7

IvanYashchuk marked this pull request as ready for review January 27, 2025 19:26

IvanYashchuk enabled auto-merge (squash) January 27, 2025 19:27

Merge branch 'main' into remove-prims-embedding

b8bb32c

mruberry reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove prims.embedding and prims.embedding_backward #1689

Remove prims.embedding and prims.embedding_backward #1689

IvanYashchuk commented Jan 24, 2025 •

edited

Loading

mruberry commented Jan 24, 2025 •

edited

Loading

IvanYashchuk Jan 24, 2025

IvanYashchuk Jan 27, 2025

riccardofelluga commented Jan 24, 2025 •

edited

Loading

IvanYashchuk commented Feb 5, 2025

mruberry commented Feb 5, 2025

mruberry Feb 5, 2025

mruberry Feb 5, 2025

Remove prims.embedding and prims.embedding_backward #1689

Are you sure you want to change the base?

Remove prims.embedding and prims.embedding_backward #1689

Conversation

IvanYashchuk commented Jan 24, 2025 • edited Loading

mruberry commented Jan 24, 2025 • edited Loading

IvanYashchuk Jan 24, 2025

Choose a reason for hiding this comment

IvanYashchuk Jan 27, 2025

Choose a reason for hiding this comment

riccardofelluga commented Jan 24, 2025 • edited Loading

IvanYashchuk commented Feb 5, 2025

mruberry commented Feb 5, 2025

mruberry Feb 5, 2025

Choose a reason for hiding this comment

mruberry Feb 5, 2025

Choose a reason for hiding this comment

IvanYashchuk commented Jan 24, 2025 •

edited

Loading

mruberry commented Jan 24, 2025 •

edited

Loading

riccardofelluga commented Jan 24, 2025 •

edited

Loading