Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for chromadb>=0.5.1 #435

Merged
merged 2 commits into from
Jun 28, 2024
Merged

add support for chromadb>=0.5.1 #435

merged 2 commits into from
Jun 28, 2024

Conversation

pmeier
Copy link
Member

@pmeier pmeier commented Jun 26, 2024

chromadb>=0.5.1 changed the output of collection.query. They added a new key-value pair to the result, which should be BC. However, since we before this PR iterated over the full dictionary and assumed that every key-value pair follows the same pattern, we are bitten by this.

This PR circumvents this assumption by only selecting the keys we actually need. Meaning, unless Chroma introduces a BC breaking change to them, we should be future proof.

@pmeier pmeier added the type: maintenance 🛠️ Day-to-day maintenance tasks label Jun 26, 2024
@pmeier pmeier mentioned this pull request Jun 26, 2024
Copy link
Contributor

@nenb nenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine.

I left some questions on things that I wasn't clear on, to make sure that you had considered them. But I think you probably have, and this is all fine.

for idx in range(num_results)
]

# That should be the default, but let's make extra sure here
results = sorted(results, key=lambda r: r["distance"])
results = sorted(results, key=lambda r: r["distances"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: was this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have elaborated on this. I previously thought that it would be a good idea to turn the plural forms into singulars when going from dict of lists to list of dicts, i.e.

{"distances": [...], "ids": [...]}

to

[{"distance": ..., "id": ...}, ...]

From a language standpoint this makes sense given that each result in the new list now has a singular distance, ID, etc. However, this makes it harder to map the Chroma documentation to the results that we want to return and I think we should value this higher than correct language. Thus, I've removed the transforming of the plural into the singular form by cutting of the final charactly (L113, key[:-1] -> L111, key).

key: [None] * num_results if value is None else value[0] # type: ignore[index]
for key, value in result.items()
}
result = {key: result[key][0] for key in ["ids", *include]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Handling the None case is no longer relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Anything that could be returned by Chroma, but is not present in the include list, results in None values. Since we no longer iterate over the full dictionary, we no longer need to care about them.

ragna/source_storages/_chroma.py Show resolved Hide resolved
@pmeier pmeier merged commit a0bf68c into main Jun 28, 2024
21 checks passed
@pmeier pmeier deleted the fix-chroma branch June 28, 2024 09:38
pmeier added a commit that referenced this pull request Jun 28, 2024
pmeier added a commit that referenced this pull request Jun 28, 2024
blakerosenthal pushed a commit that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: maintenance 🛠️ Day-to-day maintenance tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants