Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pgvectorscale extension #762

Merged
merged 7 commits into from
Sep 19, 2024
Merged

Add pgvectorscale extension #762

merged 7 commits into from
Sep 19, 2024

Conversation

vitabaks
Copy link
Owner

@vitabaks vitabaks commented Sep 18, 2024

Add Timescale pgvectorscale extension.

pgvectorscale builds on pgvector with higher performance embedding search and cost-efficient storage for AI applications.

Variables

  • enable_pgvectorscale to install pgvectorscale and pgvector extensions
  • pgvectorscale_version, default: latest

Compatible with Debian 12, Ubuntu 22.04 and 24.04 (only deb packages are available) for Postgres 13-17

Deploy Timescale HA Cluster with pgvectorscale

To deploy a PostgreSQL High-Availability Cluster with the pgvectorscale extension, add the enable_pgvectorscale variable:

ansible-playbook deploy_pgcluster.yml  -e "enable_timescale=true" -e "enable_pgvectorscale=true"

Note

Variable enable_timescale is optional, in this example we install pgvectorscale, pgvector, and timescaledb extensions.

@vitabaks vitabaks added enhancement Improvement of the current functionality new feature New functionality labels Sep 18, 2024
@vitabaks vitabaks self-assigned this Sep 18, 2024
@vitabaks
Copy link
Owner Author

vitabaks commented Sep 19, 2024

Test (Deploy Timescale HA Cluster with pgvectorscale)

ansible-playbook deploy_pgcluster.yml  -e "enable_timescale=true" -e "enable_pgvectorscale=true"

Ansible log:

PLAY [Deploy PostgreSQL HA Cluster (based on "Patroni")] ***********************
...
TASK [add-repository : Add TimescaleDB repository] *****************************
changed: [10.172.0.20]
changed: [10.172.0.21]
changed: [10.172.0.22]
...
TASK [packages : Install TimescaleDB package] **********************************
changed: [10.172.0.20] => (item=timescaledb-2-postgresql-16)
changed: [10.172.0.21] => (item=timescaledb-2-postgresql-16)
changed: [10.172.0.22] => (item=timescaledb-2-postgresql-16)

TASK [packages : Install pgvector package] *************************************
changed: [10.172.0.21]
changed: [10.172.0.22]
changed: [10.172.0.20]

TASK [packages : Looking up the latest version of pgvectorscale] ***************
ok: [10.172.0.22]
ok: [10.172.0.21]
ok: [10.172.0.20]

TASK [packages : Download pgvectorscale archive] *******************************
changed: [10.172.0.21]
changed: [10.172.0.22]
changed: [10.172.0.20]

TASK [packages : Extract pgvectorscale package] ********************************
changed: [10.172.0.20]
changed: [10.172.0.22]
changed: [10.172.0.21]

TASK [packages : Install pgvectorscale v0.3.0 package] *************************
changed: [10.172.0.22]
changed: [10.172.0.21]
changed: [10.172.0.20]
...

Create extensions

Note

Extensions can be created automatically if you define them in the postgresql_extensions variable.

postgres=# \dx
                 List of installed extensions
  Name   | Version |   Schema   |         Description          
---------+---------+------------+------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(1 row)

postgres=# show shared_preload_libraries ;
          shared_preload_libraries           
---------------------------------------------
 pg_stat_statements,auto_explain,timescaledb
(1 row)

postgres=# CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
NOTICE:  installing required extension "vector"
CREATE EXTENSION
postgres=# \dx
                               List of installed extensions
    Name     | Version |   Schema   |                     Description                      
-------------+---------+------------+------------------------------------------------------
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector      | 0.7.4   | public     | vector data type and ivfflat and hnsw access methods
 vectorscale | 0.3.0   | public     | pgvectorscale:  Advanced indexing for vector data
(3 rows)

Check vectorscale

postgres=# CREATE TABLE IF NOT EXISTS document_embedding  (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    metadata JSONB,
    contents TEXT,
    embedding VECTOR(1536)
)
postgres-# ;
CREATE TABLE
postgres=# CREATE INDEX document_embedding_idx ON document_embedding
USING diskann (embedding);
NOTICE:  Starting index build. num_neighbors=-1 search_list_size=100, max_alpha=1.2, storage_layout=SbqCompression
WARNING:  Indexed 0 tuples
CREATE INDEX
postgres=# \d+ document_embedding
                                                    Table "public.document_embedding"
  Column   |     Type     | Collation | Nullable |             Default              | Storage  | Compression | Stats target | Description 
-----------+--------------+-----------+----------+----------------------------------+----------+-------------+--------------+-------------
 id        | bigint       |           | not null | generated by default as identity | plain    |             |              | 
 metadata  | jsonb        |           |          |                                  | extended |             |              | 
 contents  | text         |           |          |                                  | extended |             |              | 
 embedding | vector(1536) |           |          |                                  | external |             |              | 
Indexes:
    "document_embedding_pkey" PRIMARY KEY, btree (id)
    "document_embedding_idx" diskann (embedding)
Access method: heap

passed

@vitabaks vitabaks merged commit 4619568 into master Sep 19, 2024
15 checks passed
@vitabaks vitabaks deleted the pgvectorscale branch September 19, 2024 15:38
@vitabaks vitabaks restored the pgvectorscale branch September 19, 2024 15:45
@vitabaks vitabaks deleted the pgvectorscale branch September 19, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of the current functionality new feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant