Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix llama sin_cached/cos_cached backward compatibility #29299

Closed

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Feb 26, 2024

The _sin_cached & _cos_cached are never set in the init (compare to https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/llama/modeling_llama.py#L134-L136), which yields errors in external packages as backward compatibility is broken (e.g. in https://github.com/AutoGPTQ/AutoGPTQ/blob/6b55300dd83326504ee6e02b730fa4451adfa479/auto_gptq/modeling/_utils.py#L95-L96)

IMO this should be in a patch release.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -100,6 +100,21 @@ def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim))
self.register_buffer("inv_freq", inv_freq, persistent=False)

# TODO: Remove in 4.40.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 4.40 here? This kind of version dependant removal would be for deprecation of a feature, but AFAICT in the PR comment we don't have an implemented fix which replaces this

Copy link
Contributor Author

@fxmarty fxmarty Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts I just followed 7d312ad The sin_cached attribute will be removed in 4.40. cc @gante

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh - OK. Won't this means things still break though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove them, no 💔

@gante
Copy link
Member

gante commented Feb 26, 2024

@fxmarty The extent of the fix may depend on the following question: are the libraries downstream broken because a) of the lack of the tensors, or because b) the lack of the tensors AND their values?

The PR as it stands would fix a), but it probably wouldn't fix b).


Full story of how this came to be:

  1. Static cache removed these tensors
  2. This PR added them, but a) NOT at init time (which this PR would fix as of the latest commit) and b) the contents and shape of the tensors are different

Related PR: #29198 (which tries to fix #29173)

@gante
Copy link
Member

gante commented Feb 26, 2024

Note to ourselves: non-permanent buffers can't be treated as common variables for deprecation purposes 😬

@ArthurZucker
Copy link
Collaborator

#29198 will add them at init time.
We keep the values but we deprecate fast because the values are not longer updated.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not duplicate the work

@fxmarty
Copy link
Contributor Author

fxmarty commented Feb 27, 2024

Was not aware #29198 was a fix for that, nice! Note that with self.register_buffer("_cos_cached", ...) I had some torch.fx tests failing.

@fxmarty fxmarty closed this Feb 27, 2024
@ArthurZucker
Copy link
Collaborator

Feel free to comment over there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants