Add .ratio to embedbase sdk #95

benjaminshafii · 2023-05-11T13:23:46Z

It would be so helpful in my use-case

results = await client.dataset(recipe_id, farm_id, user_id, locaton_id).search(question, max_token=3000, ratio=[.7, .1, .1, .1)

max_token of 2100, 300, 300, 300 are applied to each dataset.

Originally posted by @ccomkhj in #71 (comment)

The text was updated successfully, but these errors were encountered:

louis030195 · 2023-05-14T15:49:34Z

Sure! From a technical point of view just require a little tweak on SDK plus a new endpoint in embedbase instance that take a list of dataset instead of query dataset

louis030195 · 2023-05-15T13:38:55Z

@hotkartoffel lets say you have

"Basil is a green plant that need daily water..." in green_plants dataset
and "Basil is a green plant that need daily water..." in general_plants dataset

upon running

results = await client.dataset("green_plants", "general_plants").search("How should I take care of my basil when the leaves turn yellow?")

I assume you expect to receive distinct results (no duplicates?)

ccomkhj · 2023-05-16T11:39:39Z

Q1. How embedding vector knows if it's duplicates or not? (exactly same embedding vector? or Simply similarity score?)

Q2. If a dataset of "green_plants" has multiple duplicates, how is it treated in the searching algorithm?
i.e. "basil is green plants" in page 32 from PDF.
"basil is plants which are mostly green" in page 111 from PDF.
... (This kind of generic sentence can repeat +10 times)

results = await client.dataset("green_palnts").search("What is the color of basil and its taste?")
-> too much color info can suppress taste info?

Distinct results are always good but I wonder how you decide it.

louis030195 mentioned this issue May 15, 2023

Search over multiple datasets #23

Closed

louis030195 added a commit that referenced this issue May 15, 2023

feat:core:#95

2fd1fb5

louis030195 mentioned this issue May 15, 2023

feat:core:#95 #97

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add .ratio to embedbase sdk #95

Add .ratio to embedbase sdk #95

benjaminshafii commented May 11, 2023

louis030195 commented May 14, 2023

louis030195 commented May 15, 2023

ccomkhj commented May 16, 2023

Add .ratio to embedbase sdk #95

Add .ratio to embedbase sdk #95

Comments

benjaminshafii commented May 11, 2023

louis030195 commented May 14, 2023

louis030195 commented May 15, 2023

ccomkhj commented May 16, 2023