ntxent fix #946

GrantMcConachie · 2024-04-23T22:37:21Z

I added an epsilon value to the cosine similarity function to avoid the NaNs that were occurring when when you had a label vector [0, 0, 0] or when one of your embeddings was the 0 vector.

…llows for 0 vector emeddings.

…ine similarity.

…to patch-1

vroulet · 2024-04-23T23:37:17Z

Thanks again @GrantMcConachie !
Quick question: do you think it could be handled by a jnp.where?
Imagine for example that you want to normalize a vector and handle the case with a zero vector properly.
You may then do

import jax.numpy as jnp
def normalize_vec(x):
  norm = jnp.sqrt(jnp.sum(x**2))
  return jnp.where(norm == 0., x, x/norm)

Would such a logic be potentially implementable?
Or would an epsilon be actually preferable (for example in adam, the epsilon is preferred).

GrantMcConachie · 2024-04-24T12:40:32Z

Hi @vroulet! Yes I think this is possible! I will work on it and let you know.

GrantMcConachie · 2024-04-24T19:46:38Z

Hello again @vroulet, I tried the following

  norm_emb = jnp.linalg.norm(embeddings, axis=1, keepdims=True)
  norm_emb = jnp.where(norm_emb == 0.0, 1.0, norm_emb)
  embeddings = embeddings / norm_emb
  xcs = jnp.matmul(embeddings, embeddings.T) / temperature

to calculate the cross entropy, rather than the cosine_similarity function with the epsilon. This gives the same cosine similarity matrix, however the gradient resulted in NaNs.

I also tried this

xcs = jnp.where(jnp.isnan(xcs), 0.0, xcs)

keeping the 0.0 epsilon value for the cosine_similarity calculation and this also resulted in NaNs in the gradient.

I am out of ideas of other ways to implement jnp.where(), so I believe the best way to go about this is to add the epsilon in the cosine similarity! Let me know if you have any more suggestions for implementing jnp.where() though!

vroulet · 2024-04-26T17:20:58Z

Hello @GrantMcConachie,
Thanks for trying! We may have had a misunderstanding. The issue you get is when all labels the same.
This should not have anything to do with the embeddings, no? So changing the cosine similarity by adding an epsilon won't solve that issue. (It would help though if the embeddings are 0 vectors, but this would be a different bug no? And one that may be handled by cosine_similarity rather than ntxent_loss).
The issue here is in diffs that would be filled with 0 if all labels are the same.
Then you would get xcs_diffs filled with -jnp.inf, hence the bugs.
So the first question is: what should we obtain mathematically in that case? If the answer is "you should get + or - infinity" then the NaNs are ok to me. If the answer is "you should get 0. or 1. because in the limit that's what math say" then we should find a way to encode that.

Thanks again for this contribution, this issue made me look at it again and it's well done :)

Ah and btw you may add a doctest in the loss if you are on it. Understanding what should be the proper shapes etc is not necessarily evident for the user and this would help. (look at the docstring of Adam for example, you'll see a section Examples where you can format some code that would appear nicely in the docs).

GrantMcConachie · 2024-04-29T14:41:57Z

Hi @vroulet! You are definitely right. Adding this epsilon term to the cosine similarity only fixed a 0 vector embedding issue.

For the issue where all labels are the same, I think the loss should be 0. The reason is because if there are no negative pairs (diffs is filled with 0s), the denominator and numerator inside the log should be the same. The loss $l_{i, j} = - l o g \frac{e x p (s i m (z_{i}, z_{j}) / τ)}{Σ_{k = 1}^{N} 1_{k \neq i} e x p (s i m (z_{i}, z_{k}) / τ)}$ for any given embedding evaluates to $- l o g \frac{e x p (s i m (z_{i}, z_{j}) / τ)}{e x p (s i m (z_{i}, z_{j}) / τ)} = - l o g (1) = 0$ .

I was confused at first because I thought the $1_{k \neq i}$ term was 1 for each negative pair, but in reality that term is 1 for every pair except a self pair. This version of the loss assumes that every pair outside of $i, j$ is a negative pair, so the equation is a little misleading.

The equation from https://kevinmusgrave.github.io/pytorch-metric-learning/losses/#ntxentloss, where I took a lot of inspiration to build this function, is more general in that you don't need to have just 1 positive pair in your embeddings. Here you get the same evaluation: running this loss with all the same label gives you 0.

In conclusion, adding the epsilon term in cosine similarity alleviates the 0 vector embedding problem and the case in which all labels are the same should evaluate to 0 loss. Let me know if you agree!

I will start working on the doctest soon!

GrantMcConachie · 2024-05-07T20:25:29Z

Hi @vroulet! Just wanted to let you know I added a doctest! Let me know what you think.

fabianp · 2024-05-29T13:48:36Z

instead of a hard-coded 1e-12, could we perhaps replace it with np.finfo(embeddings.dtype).eps ? this way the epsilon will depend on the dtype of the embeddings (which I believe is what one would want)

GrantMcConachie · 2024-05-31T13:10:18Z

Hi @fabianp! Yes I can add this in.

fabianp · 2024-06-10T07:32:54Z

optax/losses/_self_supervised.py

@@ -55,7 +86,8 @@ def ntxent(
  # cosine similarity matrix
  xcs = (
      _regression.cosine_similarity(
-          embeddings[None, :, :], embeddings[:, None, :]
+          embeddings[None, :, :], embeddings[:, None, :],
+          epsilon=np.finfo(embeddings.dtype).eps


no need to import numpy, you can do the same with jnp instead of np

Got it! Will change shortly.

SmilingWolf · 2024-10-27T21:54:27Z

@fabianp is something in particular holding this PR up?

fabianp · 2024-10-28T11:48:35Z

there were some internal errors and then i forgot about it. Taking a look into it now

GrantMcConachie and others added 6 commits April 23, 2024 17:06

Added a way to subvert the problem with NaNs in the loss. This also a…

Loading
Loading status checks…

b5d77e5

…llows for 0 vector emeddings.

Added a 0 vector embedding test to make sure this functionality works

Loading
Loading status checks…

030ff4e

Merge branch 'google-deepmind:main' into patch-1

6c5a1f5

took away the replacement of nans and aded a epsilon value to the cos…

Loading
Loading status checks…

75fb5db

…ine similarity.

Merge branch 'patch-1' of https://github.com/GrantMcConachie/optax in…

Loading
Loading status checks…

20f5f38

…to patch-1

spelling fix

Loading
Loading status checks…

1358d70

GrantMcConachie and others added 8 commits April 29, 2024 16:08

Merge branch 'google-deepmind:main' into patch-1

Loading
Loading status checks…

bb8c208

Merge branch 'google-deepmind:main' into patch-1

Loading
Loading status checks…

b3a17bc

Added a doctest

Loading
Loading status checks…

11ee46b

rewrite docstring to pass syntax checks

Loading
Loading status checks…

037f492

truncated decimal points in doctest

Loading
Loading status checks…

b1d8eb3

added a space in the print statements for the doctest

Loading
Loading status checks…

24edd0b

minor fixes to doctest

Loading
Loading status checks…

bb325d7

changed rng key

Loading
Loading status checks…

a98cec8

changed eposilon to match dtype

Loading
Loading status checks…

d05337b

fabianp reviewed Jun 10, 2024

View reviewed changes

GrantMcConachie and others added 2 commits June 10, 2024 09:35

Merge branch 'google-deepmind:main' into patch-1

Loading
Loading status checks…

11ed8b7

changed np to jnp to get rid of numpy import

Loading
Loading status checks…

1bf4b2e

fabianp approved these changes Jun 10, 2024

View reviewed changes

fabianp approved these changes Oct 28, 2024

View reviewed changes

copybara-service bot merged commit 2e85b1b into google-deepmind:main Nov 4, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ntxent fix #946

ntxent fix #946

GrantMcConachie commented Apr 23, 2024

vroulet commented Apr 23, 2024

GrantMcConachie commented Apr 24, 2024

GrantMcConachie commented Apr 24, 2024

vroulet commented Apr 26, 2024

GrantMcConachie commented Apr 29, 2024

GrantMcConachie commented May 7, 2024

fabianp commented May 29, 2024

GrantMcConachie commented May 31, 2024

fabianp Jun 10, 2024

GrantMcConachie Jun 10, 2024

SmilingWolf commented Oct 27, 2024

fabianp commented Oct 28, 2024

ntxent fix #946

ntxent fix #946

Conversation

GrantMcConachie commented Apr 23, 2024

vroulet commented Apr 23, 2024

GrantMcConachie commented Apr 24, 2024

GrantMcConachie commented Apr 24, 2024

vroulet commented Apr 26, 2024

GrantMcConachie commented Apr 29, 2024

GrantMcConachie commented May 7, 2024

fabianp commented May 29, 2024

GrantMcConachie commented May 31, 2024

fabianp Jun 10, 2024

Choose a reason for hiding this comment

GrantMcConachie Jun 10, 2024

Choose a reason for hiding this comment

SmilingWolf commented Oct 27, 2024

fabianp commented Oct 28, 2024