You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I'm wondering whether it is viable to send new points to the appropriate shards of an existing disk index, apply the streaming procedure within each shard, and stitch everything back together. Are there any logistical/performance penalties for this strategy?
The text was updated successfully, but these errors were encountered:
I have ~80M 768d vectors and a build DRAM budget of 150 GB, so I wound up with 5 shards. Really, I am just looking for the most efficient way of handling bulk insertions/deletions (say several million vectors at a time) without having to rebuild, if possible.
@jaredcthomas. I assume you meant 768d floats. So the overall index size is about 80M * (3KB vec + 100degX4) ~ 250GB?
Loading one shard (~50GB) at a time into DRAM, batch updating and writing to SSD seems reasonable.
We have a long overdue (and out of date) PR#11 that can merge batch updates to disk as in https://arxiv.org/abs/2105.09613. Would appreciate help with that PR. In any case, we plan to get to that in the next few months.
Hello!
I'm wondering whether it is viable to send new points to the appropriate shards of an existing disk index, apply the streaming procedure within each shard, and stitch everything back together. Are there any logistical/performance penalties for this strategy?
The text was updated successfully, but these errors were encountered: