You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the "factor" in "model.py" is None first, and set to a fixed value after the first batch of first epoch, and keeped fixed during the training process. What the benefit of the strategy?
Besides,should the "factor" in "model.py" be removed or just consider the effect of self norm during obtaining embedding? As it would conssider the effect of other data in the same batch if we do not set the batch_size = 1.
Thanks again!
The text was updated successfully, but these errors were encountered:
We multiply molecule embeddings with sqrt(dim) / mean(norm(first_batch)), which is the self.factor. This means that we first divide molecule embeddings by their average norm to let the length of them to be approximately one (not exactly one, which I will explain later), then rescale their length to sqrt(dim). Here the reason is to let the distribution of embeddings be stable w.r.t. the dimension of embeddings. For example, suppose you have a 2-d vector and a 1024-d vector, if you normalize both of them, you will see that the entries in the 1024-d vector are much smaller than those in 2-d vector. This is why we need to multiply them with sqrt(dim) so that their distribution will not be affected by dim.
The second question is, why not scale each molecule embedding with sqrt(dim) / norm(this_embedding), i.e., why the factor should be fixed for all embeddings? This is because, if we scale each embedding with sqrt(dim) / norm(this_embedding), then every embedding will have the same length sqrt(dim). This is not good because molecules have different sizes. We expect large molecules to have large embeddings while small molecules to have small embeddings, so that the embedding space will be more physically meaningful. Note that a raw molecule embedding is the sum of all its atom embeddings, so the raw embedding of a large molecule will naturally be large. If we normalize them with a fixed factor, then a large molecule will still have a large embedding.
Hi, thank you for your attribition!
It seems that the "factor" in "model.py" is None first, and set to a fixed value after the first batch of first epoch, and keeped fixed during the training process. What the benefit of the strategy?
Besides,should the "factor" in "model.py" be removed or just consider the effect of self norm during obtaining embedding? As it would conssider the effect of other data in the same batch if we do not set the batch_size = 1.
Thanks again!
The text was updated successfully, but these errors were encountered: