-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vector_norm and similarity value incorrect #522
Comments
Thanks! Will figure this out. |
…e calculations into a helper function.
I think this is fixed in 1.0, but this bug makes me uneasy because I don't feel like I really understand what was wrong. I haven't had time to test 0.101.0 yet, but: you say the cosine was always half? I can't figure out why that should be... What I've come up with is that this calculation looks unreliable: for orth, lex_addr in self._by_orth.items():
lex = <LexemeC*>lex_addr
if lex.lower < vectors.size():
lex.vector = vectors[lex.lower]
for i in range(vec_len):
lex.l2_norm += (lex.vector[i] * lex.vector[i])
lex.l2_norm = math.sqrt(lex.l2_norm)
else:
lex.vector = EMPTY_VEC The |
Got it now. The previous default vectors were already normalized. This led to a value of Later, I added the capability to load custom word vectors, which meant the L2 norm had to be calculated. However, I didn't initialised the value of No tests checked the exact value returned by the similarity function. They only sanity-checked relative values. This has since been addressed. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Somehow vector_norm is incorrectly calculated.
Then vector_norm is used in similarity, which always returns a value that is always half of the correct value.
It is OK if the use case is to rank similarity scores for synonyms. But the cosine similarity score itself is incorrect.
The text was updated successfully, but these errors were encountered: