-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for chromadb>=0.5.1 #435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine.
I left some questions on things that I wasn't clear on, to make sure that you had considered them. But I think you probably have, and this is all fine.
for idx in range(num_results) | ||
] | ||
|
||
# That should be the default, but let's make extra sure here | ||
results = sorted(results, key=lambda r: r["distance"]) | ||
results = sorted(results, key=lambda r: r["distances"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: was this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have elaborated on this. I previously thought that it would be a good idea to turn the plural forms into singulars when going from dict of lists to list of dicts, i.e.
{"distances": [...], "ids": [...]}
to
[{"distance": ..., "id": ...}, ...]
From a language standpoint this makes sense given that each result in the new list now has a singular distance, ID, etc. However, this makes it harder to map the Chroma documentation to the results that we want to return and I think we should value this higher than correct language. Thus, I've removed the transforming of the plural into the singular form by cutting of the final charactly (L113, key[:-1]
-> L111, key
).
ragna/source_storages/_chroma.py
Outdated
key: [None] * num_results if value is None else value[0] # type: ignore[index] | ||
for key, value in result.items() | ||
} | ||
result = {key: result[key][0] for key in ["ids", *include]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Handling the None
case is no longer relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Anything that could be returned by Chroma, but is not present in the include
list, results in None
values. Since we no longer iterate over the full dictionary, we no longer need to care about them.
chromadb>=0.5.1
changed the output ofcollection.query
. They added a new key-value pair to the result, which should be BC. However, since we before this PR iterated over the full dictionary and assumed that every key-value pair follows the same pattern, we are bitten by this.This PR circumvents this assumption by only selecting the keys we actually need. Meaning, unless Chroma introduces a BC breaking change to them, we should be future proof.