Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Support more index types besides ivfflat #224

Merged
merged 21 commits into from
Nov 23, 2023
Merged

Support more index types besides ivfflat #224

merged 21 commits into from
Nov 23, 2023

Conversation

xuebinsu
Copy link
Contributor

@xuebinsu xuebinsu commented Nov 2, 2023

Previously, we only support indexing embeddings using the ivfflat
access method in pgvector.

Recently, a new access method hnsw has been added to pgvector.
hnsw is believed to be more performant and accurate than ivfflat.
To allow for more flexibility, we add a new parameter method to
allow user to choose which access method to use when creating index.

Also, a new parameter embedding_dimension is added to support more
models, since dimension is required for pgvector to create index.

A new test case for embeddings is added in tests/test_embedding.py.

To support set allow_system_table_mods = on;, Postgres is upgraded
from 12 to 13 on CI.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Copy link
Collaborator

@beeender beeender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some tests for embedding? Is there any blockers for testing it on CI?

greenplumpython/experimental/embedding.py Show resolved Hide resolved
@xuebinsu
Copy link
Contributor Author

Can we have some tests for embedding? Is there any blockers for testing it on CI?

I've proposed PR #225 for installing pgvector on CI. After merging that PR, we can add test cases for embedding.

Comment on lines +119 to +125
import sentence_transformers # type: ignore reportMissingImports

model = sentence_transformers.SentenceTransformer(model_name) # type: ignore reportUnknownVariableType
embedding_dimension: int = model[1].word_embedding_dimension # From models.Pooling
except:
raise NotImplementedError(
"Model '{model_name}' doesn't provide embedding dimension information"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe import error here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think import error is for modules, not models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

)
if method is not None:
assert method == "hnsw" or method == "ivfflat"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there may be more method , assert method in ["ivfflat", "hnsw"] may be bettor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's easier to read and write indeed. Changed.

@xuebinsu
Copy link
Contributor Author

A new test case for embeddings is added in tests/test_embedding.py.

To support set allow_system_table_mods = on;, Postgres is upgraded
from 12 to 13 on CI.

Copy link
Contributor

@yihong0618 yihong0618 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xuebinsu xuebinsu merged commit 4f52aae into main Nov 23, 2023
6 checks passed
@xuebinsu xuebinsu deleted the feat_hnsw branch November 23, 2023 07:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants