-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .ratio to embedbase sdk #95
Comments
Sure! From a technical point of view just require a little tweak on SDK plus a new endpoint in embedbase instance that take a list of dataset instead of query dataset |
@hotkartoffel lets say you have "Basil is a green plant that need daily water..." in green_plants dataset upon running
I assume you expect to receive distinct results (no duplicates?) |
Q1. How embedding vector knows if it's duplicates or not? (exactly same embedding vector? or Simply similarity score?) Q2. If a dataset of "green_plants" has multiple duplicates, how is it treated in the searching algorithm?
Distinct results are always good but I wonder how you decide it. |
It would be so helpful in my use-case
max_token of 2100, 300, 300, 300 are applied to each dataset.
Originally posted by @ccomkhj in #71 (comment)
The text was updated successfully, but these errors were encountered: