Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding backward workaround #1756

Merged
merged 7 commits into from
Jan 20, 2025

Conversation

pmarkovicTT
Copy link
Contributor

@pmarkovicTT pmarkovicTT commented Jan 13, 2025

This PR introduces workaround for embedding backward op. Due to TTNN constraint to accept only BF16 and BF8 data types, this workaround casts F32 to BF16 and back. This is follow up to #1583

Closes #1503

Copy link
Contributor

@sdjordjevicTT sdjordjevicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change, thanks @pmarkovicTT! Can you change a silicon test to use f32 instead of bf16, so we can test the workaround on silicon as well?

@pmarkovicTT pmarkovicTT force-pushed the pmarkovic/embedding_backward_workaround branch from 9a639d9 to ddd511b Compare January 17, 2025 14:33
Copy link
Contributor

@sdjordjevicTT sdjordjevicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pmarkovicTT
Copy link
Contributor Author

pmarkovicTT commented Jan 17, 2025

Changed the test and it passed tests (cmake --build build -- check-ttmlir) @sdjordjevicTT

@pmarkovicTT pmarkovicTT merged commit a20127a into main Jan 20, 2025
23 checks passed
@pmarkovicTT pmarkovicTT deleted the pmarkovic/embedding_backward_workaround branch January 20, 2025 09:26
pmarkovicTT added a commit to tenstorrent/tt-tvm that referenced this pull request Jan 27, 2025
We don't need explicit embedding dataformat cast in tvm (from float32 to
bf16) as dataformat workaround for this case is implemented in mlir.

PRs for reference:

- [TVM cast workaround](#55)
- [Embedding Op
workaround](tenstorrent/tt-mlir#1583)
- [EmbeddingBackward Op
workaround](tenstorrent/tt-mlir#1756)

Related to this issue
tenstorrent/tt-forge-fe#1112
pmarkovicTT added a commit to tenstorrent/tt-forge-fe that referenced this pull request Feb 7, 2025
#1111)

### Ticket
Close #1112

### Problem description
We don't need explicit embedding dataformat cast in tvm (from float32 to
bf16) as dataformat workaround for this case is implemented in mlir.

PRs for reference:
- [TVM change](tenstorrent/tt-tvm#59)
- [Embedding Op
workaround](tenstorrent/tt-mlir#1583)
- [EmbeddingBackward Op
workaround](tenstorrent/tt-mlir#1756)

### What's changed
Removed explicit cast to bfloat16 if dataformat for embedding weights is
float32.
Updated llama backward test to reflect new forge api for training
(setting training argument).

### Checklist
- [x] Remove explicit cast in
third_party/tvm/python/tvm/relay/frontend/pytorch.py
- [x] Update test_llama_backward.py

---------

Co-authored-by: Vladimir Milosevic <157983820+vmilosevic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EmbeddingBW] input gradient must be cast to BF16
2 participants