You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if I'm missing something in the documentation, but how do we periodically reindex new embeddings? Having a diskann index makes "INSERT" incredibly slow.
This is the behavior that I observe:
Initial Index
CREATEINDEXix_chunk_embeddingON chunk
USING diskann (embedding)
WHERE (indexed = true)
(The indexed = true condition is for easily bringing rows into and out of the index.)
Fresh Index
SELECT pg_size_pretty(pg_relation_size('ix_chunk_embedding')) AS index_size;
index_size
------------24 kB
(1 row)
Add rows to index
UPDATE chunk
SET indexed = true
WHERE id IN [... 500 ids ...];
This takes 4.1 seconds. The only reason why i'm doing indexed = true, is so that appending rows to the database doesn't take forever.
Check Index size again
SELECT pg_size_pretty(pg_relation_size('ix_chunk_embedding')) AS index_size;
index_size
------------4024 kB
(1 row)
Reindex
REINDEX INDEX ix_chunk_embedding;
This runs almost instantly (<100ms?)
SELECT pg_size_pretty(pg_relation_size('ix_chunk_embedding')) AS index_size;
index_size
------------352 kB
(1 row)
And it's over 10x smaller.
My issue is that incremental updates are both incredibly slow, and also not space efficient. Is there a way to disable incremental indexing altogether? I don't mind if incrementally indexed rows don't show up in recall, because I can just do SET indexed = true and then follow-up with a REINDEX CONCURRENTLY. Right now, I have no efficient way to do this.
Theoretically, there should be ways to make updates should be more efficient (MSFT advertises 1000 QPS and sub-ms latency for inserts https://youtu.be/BnYNdSIKibQ?t=352), but even then a SET LOCAL diskann_suppress_indexing = on; to disable INSERT/UPDATE indexing would still be useful. Drop+Recreate doesn't work because then queries stop working.
The text was updated successfully, but these errors were encountered:
I'm not sure if I'm missing something in the documentation, but how do we periodically reindex new embeddings? Having a diskann index makes "INSERT" incredibly slow.
This is the behavior that I observe:
Initial Index
(The
indexed = true
condition is for easily bringing rows into and out of the index.)Fresh Index
Add rows to index
This takes 4.1 seconds. The only reason why i'm doing
indexed = true
, is so that appending rows to the database doesn't take forever.Check Index size again
Reindex
This runs almost instantly (<100ms?)
And it's over 10x smaller.
My issue is that incremental updates are both incredibly slow, and also not space efficient. Is there a way to disable incremental indexing altogether? I don't mind if incrementally indexed rows don't show up in recall, because I can just do
SET indexed = true
and then follow-up with aREINDEX CONCURRENTLY
. Right now, I have no efficient way to do this.Theoretically, there should be ways to make updates should be more efficient (MSFT advertises 1000 QPS and sub-ms latency for inserts https://youtu.be/BnYNdSIKibQ?t=352), but even then a
SET LOCAL diskann_suppress_indexing = on;
to disable INSERT/UPDATE indexing would still be useful. Drop+Recreate doesn't work because then queries stop working.The text was updated successfully, but these errors were encountered: