Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update from main #1

Merged

Conversation

KennethEnevoldsen
Copy link
Owner

Description

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.

danieldk and others added 5 commits February 8, 2024 20:29
* Clear output of Torch SDPA for masked pieces

Since Torch 2.1, the Torch memory-efficient SDPA GPU kernel returns NaN
for pieces that are completely masked out. This leads to NaN propagation
in the next attention layer, because masked pieces get an attention of
zero, but zero times NaN is still NaN.

In this we fix this by setting masked tokens to zero to clear out any
NaNs.

We currently rely on the query dimension of the mask to be singular, but
in the future we should probably redesign the `AttentionMask` class to
account for the differences between attention masks and causal masks.

* black

* Update MyPy version to one that supports recent PyTorch

* Comment typos and fixes

* Add assertion message

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* black

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
There was a subtle bug where we populate models with parameters that are
not leaf nodes because we called `to` on them for device placement.

This change fixes this issue and validates that all model parameters are
leaf nodes in the model tests.
We added support for TorchScript tracing a while back, so that models
can be exported to ONNX. However, the support relies on metaclasses,
which breaks with torch.compile in the latest PyTorch versions. However,
PyTorch now provides a TorchDynamo-based ONNX exporter:

https://pytorch.org/docs/stable/onnx_dynamo.html

So it's time to yank TorchScript tracing support and remove all the
fragile dataclass/tuple/dict polymorphism.
* Fix `test_rotary_embeddings_against_hf` for latest transformers

* xfail test because HfFileSystem is currently broken
@KennethEnevoldsen KennethEnevoldsen merged commit 8dc7b1a into KennethEnevoldsen:added_electra_encoder Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants