Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the "factor" in "model.py" be removed during obtaining embedding? #4

Open
koyurion opened this issue Dec 8, 2022 · 1 comment

Comments

@koyurion
Copy link

koyurion commented Dec 8, 2022

Hi, thank you for your attribition!

It seems that the "factor" in "model.py" is None first, and set to a fixed value after the first batch of first epoch, and keeped fixed during the training process. What the benefit of the strategy?

Besides,should the "factor" in "model.py" be removed or just consider the effect of self norm during obtaining embedding? As it would conssider the effect of other data in the same batch if we do not set the batch_size = 1.

Thanks again!

@hwwang55
Copy link
Owner

hwwang55 commented Dec 8, 2022

Hi there, this is a good question!

  1. We multiply molecule embeddings with sqrt(dim) / mean(norm(first_batch)), which is the self.factor. This means that we first divide molecule embeddings by their average norm to let the length of them to be approximately one (not exactly one, which I will explain later), then rescale their length to sqrt(dim). Here the reason is to let the distribution of embeddings be stable w.r.t. the dimension of embeddings. For example, suppose you have a 2-d vector and a 1024-d vector, if you normalize both of them, you will see that the entries in the 1024-d vector are much smaller than those in 2-d vector. This is why we need to multiply them with sqrt(dim) so that their distribution will not be affected by dim.
  2. The second question is, why not scale each molecule embedding with sqrt(dim) / norm(this_embedding), i.e., why the factor should be fixed for all embeddings? This is because, if we scale each embedding with sqrt(dim) / norm(this_embedding), then every embedding will have the same length sqrt(dim). This is not good because molecules have different sizes. We expect large molecules to have large embeddings while small molecules to have small embeddings, so that the embedding space will be more physically meaningful. Note that a raw molecule embedding is the sum of all its atom embeddings, so the raw embedding of a large molecule will naturally be large. If we normalize them with a fixed factor, then a large molecule will still have a large embedding.

Repository owner deleted a comment from koyurion Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants