You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built a project on top of the postgres-pgvector template, but the database queries were really slow. I had 2.8 million rows in the database with 512 dimensional embedding vectors. Queries took like 12 seconds. (After using an ivfflat index it took 100ms)
After investigating the issue, it turns out the queries don't use the vector index, because we're looking for the 1 - cosineDistance, while the index is built for the cosine distance only.
The fix is quite simple, need to look for the smallest cosineDistance, instead of the largest 1-cosineDistance:
Note:
Use postgres EXPLAIN to see the query plan.
For this small dataset of 150 pokemons, for both queries the index aren't used, to test if index would be used on big datasets, so set SET SESSION enable_seqscan=false;
Exported sql queries: before, after, use with SET SESSION enable_seqscan=false; EXPLAIN {query} is psql/pgadmin
I'll do a PR for this repo later.
The text was updated successfully, but these errors were encountered:
I built a project on top of the postgres-pgvector template, but the database queries were really slow. I had 2.8 million rows in the database with 512 dimensional embedding vectors. Queries took like 12 seconds. (After using an ivfflat index it took 100ms)
Reproduction repo is here: https://github.com/martinloretzzz/nextjs-drizzle-pgvector
After investigating the issue, it turns out the queries don't use the vector index, because we're looking for the 1 - cosineDistance, while the index is built for the cosine distance only.
The fix is quite simple, need to look for the smallest cosineDistance, instead of the largest 1-cosineDistance:
Note:
Use postgres
EXPLAIN
to see the query plan.For this small dataset of 150 pokemons, for both queries the index aren't used, to test if index would be used on big datasets, so set
SET SESSION enable_seqscan=false;
Exported sql queries: before, after, use with
SET SESSION enable_seqscan=false; EXPLAIN {query}
is psql/pgadminI'll do a PR for this repo later.
The text was updated successfully, but these errors were encountered: