-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite #35075
Comments
Hi @MayStepanyan, there's another related issue with the embeddings initialization here: #34570. It seems like we should probably refactor this block. Would you be willing to open a PR? I think a combination of your suggestions + the faster |
Hi @Rocketknight1. Of course, I’d like to work on the bug fix when I have some spare time. As a side note, I think we don't need to replace |
@MayStepanyan you're totally right, good point! When skimming the code I missed that the eigenvalues were just used to check that condition. |
hi @MayStepanyan |
Hi @abuelnasr0, thanks for the tip. I wasn't going to check the scaled matrix, since given a positive definite matrix By the way, what is the logic of using |
In the original article, the author multiplied the covariance matrix by The reason behind |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers
version: 4.37.1Who can help?
When resizing token embeddings for models like MobileBert, iBert etc,
resize_token_embeddings
calls an underlyingtransformers.modeling_utils._init_added_embeddings_with_mean
. It should initialize new embedding weights using the old ones:vector * vector.T / vector_dim
I noticed the check in step
3
ALWAYS fails, i.e. no matrix is considered as positive definite.The problem seems to be in these lines
since the eigenvalues calculated with
torch.linalg.eigvals
are complex andtorch.is_complex
returnsTrue
for them. Hence, the main logic, i.e. constructing a multivariate distribution from the previous embeddings and sample from it, might never work (at least in my experiments).Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Here's an isolated example testing the lines I mentioned above:
This outputs
False
despite the matrix having two positive real eigenvalues -6
and1
Expected behavior
The function should successfully generate a multivariate normal distribution whenever the calculated sigma is positive definite and symmetric.
I think the check might be replaced with something like:
The text was updated successfully, but these errors were encountered: