When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite #35075

MayStepanyan · 2024-12-04T08:39:19Z

System Info

transformers version: 4.37.1
Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

When resizing token embeddings for models like MobileBert, iBert etc, resize_token_embeddings calls an underlying transformers.modeling_utils._init_added_embeddings_with_mean. It should initialize new embedding weights using the old ones:

calculate the mean vector of old embedding vectors
calculate a sigma matrix using this vector - vector * vector.T / vector_dim
check if its positive-definite, i.e. can be used as a covariance matrix for a new distribution

if so, sample from estimated distribution
else just initialize the new embeddings from the mean vector of previous ones

I noticed the check in step 3 ALWAYS fails, i.e. no matrix is considered as positive definite.

The problem seems to be in these lines

         eigenvalues = torch.linalg.eigvals(covariance)
         is_covariance_psd = bool(
            (covariance == covariance.T).all() and not torch.is_complex(eigenvalues) and (eigenvalues > 0).all()
        )

since the eigenvalues calculated with torch.linalg.eigvals are complex and torch.is_complex returns True for them. Hence, the main logic, i.e. constructing a multivariate distribution from the previous embeddings and sample from it, might never work (at least in my experiments).

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Here's an isolated example testing the lines I mentioned above:

import torch

covariance = torch.Tensor([[5,4],[1,2]])
eigenvalues = torch.linalg.eigvals(covariance)
is_covariance_psd = bool((covariance == covariance.T).all() and not torch.is_complex(eigenvalues) and (eigenvalues > 0).all())
print(is_covariance_psd)

This outputs False despite the matrix having two positive real eigenvalues - 6 and 1

Expected behavior

The function should successfully generate a multivariate normal distribution whenever the calculated sigma is positive definite and symmetric.

I think the check might be replaced with something like:

from torch.distributions import constraints

is_psd = constraints.positive_definite.check(covariance).item()

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-12-04T14:26:45Z

Hi @MayStepanyan, there's another related issue with the embeddings initialization here: #34570. It seems like we should probably refactor this block.

Would you be willing to open a PR? I think a combination of your suggestions + the faster eigvalsh function, because covariance matrices are always symmetric, would be really nice to have. The eigvalsh function might also help avoid this issue - eigvals returns a complex tensor even when the eigenvalues are all real, whereas eigvalsh returns a float tensor because symmetric matrices always have real eigenvalues.

MayStepanyan · 2024-12-05T10:42:18Z

Hi @Rocketknight1.

Of course, I’d like to work on the bug fix when I have some spare time.

As a side note, I think we don't need to replace eigvals with anything - the constraint check from torch.distributions.constraints seems to be enough, and we don't need to compute eigenvalues explicitly.

Rocketknight1 · 2024-12-05T15:19:59Z

@MayStepanyan you're totally right, good point! When skimming the code I missed that the eigenvalues were just used to check that condition.

abuelnasr0 · 2024-12-14T17:07:16Z

hi @MayStepanyan
Thanks for this, you are right. the eigenvalues are always complex.
I think your suggested solution is perfect. I have tried it locally and I can assure you that.
my only tip when you open a PR is to note that we are using 1e-9 * covariance not covariance, so the line should be is_psd = constraints.positive_definite.check(1e-9 * covariance).item()
not noticing that in the first place was the reason behind this PR #34037.

MayStepanyan · 2024-12-15T11:14:38Z

Hi @abuelnasr0, thanks for the tip.

I wasn't going to check the scaled matrix, since given a positive definite matrix A and a positive epsilon, the epsilon * A matrix will also be positive definite. Now when you've mentioned it I figured we might very rarely get numerical instabilities when multiplying the matrix by 1e-9, so we might really check the condition for the scaled matrix.

By the way, what is the logic of using 1e-9 * covariance? Is it just for making the final distribution less scattered, or is there other reasoning?

abuelnasr0 · 2024-12-15T15:02:40Z

In the original article, the author multiplied the covariance matrix by 1e-5 https://nlp.stanford.edu/~johnhew/vocab-expansion.html
There is no mention of the reason behind this in the article but I had thought of it the same as you #33325 (comment)

The reason behind 1e-9 not 1e-5 is that some models have high covariance so an outlier or more can be sampled and cause this test case to fail https://github.com/abuelnasr0/transformers/blob/8e60a368221f15720fbd5a34b41c95eafd66b0d7/tests/test_modeling_common.py#L1855

github-actions · 2025-01-09T08:03:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

MayStepanyan added the bug label Dec 4, 2024

MayStepanyan changed the title ~~Extending embeddings~~ When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite Dec 4, 2024

Rocketknight1 mentioned this issue Jan 9, 2025

Exploit symmetry of covariance matrices for faster & more stable diag… #35571

Open

abuelnasr0 mentioned this issue Jan 14, 2025

use torch constraints to check if covariance is positive definite during mean resizing. #35693

Merged

github-actions bot closed this as completed Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite #35075

When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite #35075

MayStepanyan commented Dec 4, 2024 •

edited

Loading

Rocketknight1 commented Dec 4, 2024 •

edited

Loading

MayStepanyan commented Dec 5, 2024

Rocketknight1 commented Dec 5, 2024

abuelnasr0 commented Dec 14, 2024

MayStepanyan commented Dec 15, 2024

abuelnasr0 commented Dec 15, 2024

github-actions bot commented Jan 9, 2025

When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite #35075

When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite #35075

Comments

MayStepanyan commented Dec 4, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Dec 4, 2024 • edited Loading

MayStepanyan commented Dec 5, 2024

Rocketknight1 commented Dec 5, 2024

abuelnasr0 commented Dec 14, 2024

MayStepanyan commented Dec 15, 2024

abuelnasr0 commented Dec 15, 2024

github-actions bot commented Jan 9, 2025

MayStepanyan commented Dec 4, 2024 •

edited

Loading

Rocketknight1 commented Dec 4, 2024 •

edited

Loading