-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider adding soft-cosine distance #21
Comments
Thanks for the idea! I think I need your support here. I understand the definition. However I have no idea how to build convenient API for that. Set of examples would be helpful. |
I think it should simply allow to plugin in any (distance) function which takes 2 values and returns a float.
Concrete case comes from NLP- A language aware function: (defn word-dist [token-1 token-2]
(soft-cosine [ "I" "like" "fruits"] ["I" "like" "banana"] word-dist) = .... > 0.6 (not sure about concrete number) In practice we would map all tokens to number first (this makes the vocabulary), |
I found here: an old Java implementation which combines TFID and soft-cosine I would prefer to have this separated. The tfidf part we have already here: This gives me the 2 vectors above, I want to get the distance for. The "classical" way is to use simple cosine distance, but this is then not able to deal with "similarity of tokens". SoftCosine should be better , hopefully. |
an other exmaple would be to plugin in text embeddings (word2vec). There is an Java implmentation here, and so I would plugin this concrete function: (just doing first the mapping to the vocabulary token<->index) |
Useful in comparing TFIDF text representations, instead of using cosine
https://en.wikipedia.org/wiki/Cosine_similarity#Soft_cosine_measure
The similarity function s_i_j should be plugable (as input to the function)
The text was updated successfully, but these errors were encountered: