You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The FreshDiskANN paper outlines the StreamingMerge procedure. In combing through the codebase (main @ f8ef303), there doesn't appear to be a singular entrypoint that allows a caller to utilize the FreshDiskANN API contract without being aware of all the types of indices.
test_streaming_scenario.cpp outlines how to build an in-memory index that supports inserts and deletes.
build_stitched_index.cpp outlines how to merge indices
search_disk_index.cpp demonstrates how to run a search across an index that is stored on disk.
Given a client that provides a memory budget and no starting list of vectors, my reading of the paper would indicate the following needs to be done in a wrapping class:
create an empty, streaming enabled, in-memory index that holds writes - this is outlined in test_streaming_scenario.cpp and is the only sink for insertions.
create an empty, SSD resident index, which is demonstrated by build_disk_index.cpp... this index would not have a true build phase as there is nothing to add.
once the mutable index in [1] is full, merge [1] and [2] using the routine outlined in merge_shards within disk_utils.h - during the merge process, we would have already created a new mutable in-memory index for any in-flight writes + deletes.
separately, maintain a list of deletions that are used for filtering within all live indices.
I would be happy to submit a patch that unifies the above in such a way that a caller can just create an Index and not have to worry about RO-TempIndex, RW-TempIndex and the SSD-Resident Index; however, I would like to confirm that my read on the current codebase is correct in that there is no singular entrypoint for this.
The text was updated successfully, but these errors were encountered:
@infrawhispers You are right, there is no single entry point for this yet. Also, merge_shards is different from fresh-diskann paper. It is the method described in the original DiskANN paper. The procedure to merge an in-mem index to a SSD index and create a new SSD index is not yet in main. There is an outdated version in #11 which needs to be redone for the latest main. Once that is done, we can attempt a single entry point. You are most welcome to contribute any of these.
hi @harsha-simhadri , I'm also interested in the FreshDiskANN implementation. Is there any roadmap about that?
By the way, what is the difference between the #11 and current code behind apps/test_insert_deletes_consolidate ?
hi @harsha-simhadri , I'm also interested in the FreshDiskANN implementation. Is there any roadmap about that?
By the way, what is the difference between the #11 and current code behind apps/test_insert_deletes_consolidate ?
Hi!
The FreshDiskANN paper outlines the StreamingMerge procedure. In combing through the codebase (main @ f8ef303), there doesn't appear to be a singular entrypoint that allows a caller to utilize the FreshDiskANN API contract without being aware of all the types of indices.
test_streaming_scenario.cpp
outlines how to build an in-memory index that supports inserts and deletes.build_stitched_index.cpp
outlines how to merge indicessearch_disk_index.cpp
demonstrates how to run a search across an index that is stored on disk.Given a client that provides a memory budget and no starting list of vectors, my reading of the paper would indicate the following needs to be done in a wrapping class:
test_streaming_scenario.cpp
and is the only sink for insertions.build_disk_index.cpp
... this index would not have a true build phase as there is nothing to add.merge_shards
within disk_utils.h - during the merge process, we would have already created a new mutable in-memory index for any in-flight writes + deletes.I would be happy to submit a patch that unifies the above in such a way that a caller can just create an Index and not have to worry about RO-TempIndex, RW-TempIndex and the SSD-Resident Index; however, I would like to confirm that my read on the current codebase is correct in that there is no singular entrypoint for this.
The text was updated successfully, but these errors were encountered: