-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Smoother API for .delete() #3207
Comments
@mr-infty, you can use this: import uuid
import chromadb
import numpy as np
data = np.random.uniform(-1, 1, (500, 384))
client = chromadb.PersistentClient("delete_all")
collection = client.get_or_create_collection("test_collection")
ids = [f"{uuid.uuid4()}" for i in range(data.shape[0])]
documents = [f"document {i}" for i in range(data.shape[0])]
collection.add(ids=ids, embeddings=data, documents=documents)
print("Collection count", collection.count())
collection.delete(where={"__bastion_key__": {"$ne":1}})
print("Collection count after delete", collection.count()) Works like a charm. However you should note that due to how HNSW index works it is recommended to delete and recreate the collection to avoid a caveats: HNSW has an unbound growth, deleted embeddings are only flagged as deleted. |
Okay, I guess that It seems to me that providing the ability of have empty metadata and empty filters would streamline the API a lot. |
We actually are somewhat opposed to allowing people to easily delete everything in their collection, its too easy a footgun to do accidentally. Maybe we could do a safety override. I.e
|
@mr-infty, we have similar mechanic to delete all with MySQL has something similar with Regarding empty params, it feels to me not very ergonomic. Wouldn't it make sense the absence of parameters to be treated as empty params rather forcing empty params. It introduces a confusion such as, is deleting nothing that matches the same as deleting all - much like the example I've shown you above, it ugly and confusing as hell (it does the job though). Going down that 🐰 hole you might as well make the argument for a completely separate method that conveys in non-ambiguous terms what it does e.g. |
@tazarov No, there is no confusion: the most obvious semantics of |
@mr-infty, that’s an interesting point. Following that logic, wouldn’t it make sense to approach deletion like this? collection.delete(ids=collection.get(include=[])["ids"]) This way, deletion is strictly tied to explicit selections (via get()), avoiding any ambiguity about whether “no selection” should be interpreted as a valid selection. |
@tazarov Yes, that would be one possible way of doing it. However, what I had in mind was more like reifying the selection itself as some data structure, so that you could say |
Describe the problem
At the moment, there is no convenient way to delete all entries in a collection (without deleting the collection itself). Even though .delete() accepts
None
as an argument toids
, there is no "wildcard filter" that could be given towhere
as an argument.Describe the proposed solution
Either make .delete() delete all entries in the collection or make it possible to pass
where={}
as a wild-card filter matching all documents.Alternatives considered
No response
Importance
would make my life easier
Additional Information
No response
The text was updated successfully, but these errors were encountered: