Skip to content

Commit

Permalink
docs: development docs
Browse files Browse the repository at this point in the history
Signed-off-by: usamoi <usamoi@outlook.com>
  • Loading branch information
usamoi committed Jan 4, 2024
1 parent eb44c26 commit 41fcf5d
Show file tree
Hide file tree
Showing 6 changed files with 171 additions and 115 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ More details at [./docs/comparison-pgvector.md](./docs/comparison-pgvector.md)
- [Searching](./docs/searching.md)
- [Comparison with pgvector](./docs/comparison-pgvector.md)
- [Why not a specialty vector database?](./docs/comparison-with-specialized-vectordb.md)
- [Development](./docs/development.md)

For users, we recommend you to try pgvecto.rs using our pre-built docker image, by running

Expand Down
2 changes: 1 addition & 1 deletion crates/service/src/prelude/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ ADVICE: Check if dimensions and scalar type of the vector is matched with the in
#[error("\
IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. \
Please check the full PostgreSQL log to get more information.\
Please check the full PostgreSQL log to get more information. Please read `https://github.com/tensorchord/pgvecto.rs/blob/main/docs/configuration.md`.\
")]
Ipc,
#[error("\
Expand Down
14 changes: 14 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Configuration

## Logging

By default, you cannot capture all pgvecto.rs logs. pgvecto.rs starts a background worker process for indexing, and it prints logs to standard error. To capture them, you need to set `logging_collector` to `on`. You can get more information from [PostgreSQL document about logging collector](https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOGGING-COLLECTOR).

You can set `logging_collector` to `on` with the following command:

```sh
psql -U postgres -c 'ALTER SYSTEM SET logging_collector = on;'
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
docker restart pgvecto-rs-demo # for pgvecto.rs running in docker
```
145 changes: 145 additions & 0 deletions docs/development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Development

## Environment

You can setup development environment following these steps:

1. Install base dependency.

```sh
sudo apt install -y \
build-essential \
libpq-dev \
libssl-dev \
pkg-config \
gcc \
libreadline-dev \
flex \
bison \
libxml2-dev \
libxslt-dev \
libxml2-utils \
xsltproc \
zlib1g-dev \
ccache \
clang \
git
```

2. Install Rust. The following command will install Rustup, the Rust toolchain installer for your user. Do not install rustc using package manager.

```sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

3. Install PostgreSQL and its headers. We assume you may install PostgreSQL 15. Feel free to replace `15` to any other major version number you need.

```sh
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install libpq-dev postgresql-15 postgresql-server-dev-15
```

4. Install clang-16. We do not support other versions of clang.

```sh
sudo sh -c 'echo "deb http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-16 main" >> /etc/apt/sources.list'
wget --quiet -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install clang-16
```

5. Clone the Repository.

```sh
git clone https://github.com/tensorchord/pgvecto.rs.git # or `git clone git@github.com:tensorchord/pgvecto.rs.git`
cd pgvecto.rs
```

6. Install cargo-pgrx.

```sh
cargo install cargo-pgrx@$(grep 'pgrx = {' Cargo.toml | cut -d '"' -f 2)
cargo pgrx init --pg15=/usr/lib/postgresql/15/bin/pg_config
```

Now you can install pgvecto.rs with the following command:

```sh
cargo pgrx install --sudo --release
```

To make pgvecto.rs work, configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.

```sh
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors.so"'
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
```

Now you can connect to the database and enable the extension.

```sql
DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;
```

### Cross Compilation

Assuming that you build target for aarch64 in a x86_64 host environment, you can follow these steps:

1. Install cross compilation toolchain.

```sh
sudo apt install crossbuild-essential-arm64
```

2. Get PostgreSQL header files on target architecture.

```sh
apt download postgresql-server-dev-15:arm64
```

3. Set right linker and sysroot for Rust by adding the following section to the end of `~/.cargo/config.toml`.

```toml
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"

[env]
BINDGEN_EXTRA_CLANG_ARGS_aarch64_unknown_linux_gnu = "-isystem /usr/aarch64-linux-gnu/include/ -ccc-gcc-name aarch64-linux-gnu-gcc"
```

## Debug

Debug information included in the compiled binary even in release mode so you can always use `gdb` for debugging.

For a debug build, backtrace is printed when a thread in background worker process panics, but not for a session process error. For a release build, backtrace is never printed. But if you set environment variable `RUST_BACKTRACE` to `1`, all backtraces are printed. It's recommand for you to debug a release build with the command `RUST_BACKTRACE=1 cargo pgrx run --release`.

Check warning on line 118 in docs/development.md

View workflow job for this annotation

GitHub Actions / Spell Check with Typos

"recommand" should be "recommend".

## Pull Request

### Version

pgvecto.rs uses `pg_vectors` directory under PostgreSQL data directory for storage. To reduce the unnecessary rebuilding indexes when upgrade, we record version number of persistent data. If you modify the structure of persistent data, you need to bump the `VERSION` (if it's a breaking change) or `SOFT_VERSION` (if a newer version can still read old data).

The version number is saved in these two files:

1. `/crates/service/src/worker/metadata.rs` (if the structure of persistent data you modified is outside an index).
2. `/crates/service/src/instance/metadata.rs` (if the structure of persistent data you modified is inside an index).

## Release

These steps are needed for a release:

1. Get a new version number. Let's say it's `99.99.99` and its former version number is `98.98.98`.
2. Push these changes to `main` branch.
* Modify the latest version number in `/README.md` and `/docs/installation.md` to `99.99.99`.
* Use `cargo pgrx schema` to generate a schema script and upload it to `/sql/vectors--99.99.99.sql`.
* Write a schema update script and upload it to `/sql/vectors--98.98.98--99.99.99.sql`.
3. Manually trigger `Release` CI.

These steps are needed for a prerelease:

1. Get a new version number. Let's say it's `99.99.99-alpha`.
2. Manually trigger `Release` CI with checkbox `prerelease` checked.
119 changes: 7 additions & 112 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,130 +17,25 @@ CREATE EXTENSION vectors;

To achieve full performance, please mount the volume to pg data directory by adding the option like `-v $PWD/pgdata:/var/lib/postgresql/data`

You can configure PostgreSQL by the reference of the parent image in https://hub.docker.com/_/postgres/.
You can configure PostgreSQL by [the reference of the parent image](https://hub.docker.com/_/postgres/).

## Install from source

Install base dependency.

```sh
sudo apt install -y \
build-essential \
libpq-dev \
libssl-dev \
pkg-config \
gcc \
libreadline-dev \
flex \
bison \
libxml2-dev \
libxslt-dev \
libxml2-utils \
xsltproc \
zlib1g-dev \
ccache \
clang \
git
```

Install Rust. The following command will install Rustup, the Rust toolchain installer for your user. Do not install rustc using package manager.

```sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

Install PostgreSQL and its headers. We assume you may install PostgreSQL 15. Feel free to replace `15` to any other major version number you need.

```sh
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install libpq-dev postgresql-15 postgresql-server-dev-15
```

Install clang-16. We do not support other versions of clang.

```sh
sudo sh -c 'echo "deb http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-16 main" >> /etc/apt/sources.list'
wget --quiet -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install clang-16
```

Clone the Repository. Note the following commands are executed in the cloned repository directory.

```sh
git clone https://github.com/tensorchord/pgvecto.rs.git
cd pgvecto.rs
```

Install cargo-pgrx.

```sh
cargo install cargo-pgrx@$(grep 'pgrx = {' Cargo.toml | cut -d '"' -f 2)
cargo pgrx init --pg15=/usr/lib/postgresql/15/bin/pg_config
```

Install pgvecto.rs.

```sh
cargo pgrx install --sudo --release
```

Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.

```sh
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors.so"'
```

You need restart the PostgreSQL cluster.

```sh
sudo systemctl restart postgresql.service
```

Connect to the database and enable the extension.

```sql
DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;
```

### Cross compilation

Assuming that you build target for aarch64 in a x86_64 host environment, you need to set right linker and sysroot for Rust.

```sh
sudo apt install crossbuild-essential-arm64
```

Add the following section to the end of `~/.cargo/config.toml`.

```toml
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"

[env]
BINDGEN_EXTRA_CLANG_ARGS_aarch64_unknown_linux_gnu = "-isystem /usr/aarch64-linux-gnu/include/ -ccc-gcc-name aarch64-linux-gnu-gcc"
```
Please read [Development](./development.md).

## Install from release

Download the deb package in the release page, and type `sudo apt install vectors-pg15-*.deb` to install the deb package.
1. Download the deb package in the release page, and type `sudo apt install vectors-pg15-*.deb` to install the deb package.

Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.
2. Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.

```sh
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors.so"'
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
```

You need restart the PostgreSQL cluster.

```sh
sudo systemctl restart postgresql.service
```

Connect to the database and enable the extension.
3. Connect to the database and enable the extension.

```sql
DROP EXTENSION IF EXISTS vectors;
Expand Down
5 changes: 3 additions & 2 deletions docs/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ You can delete the folder with this command:

```shell
rm -rf $(psql -U postgres -tAqX -c $'SELECT CONCAT(CURRENT_SETTING(\'data_directory\'), \'/pg_vectors\');')
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
docker restart pgvecto-rs-demo # for pgvecto.rs running in docker
```

If you are using Docker, you can just delete `pg_vectors` folder under the volume directory too.

You need to restart PostgreSQL.

* Reindex.

You can list all indexes that needed to be reindexed with this command:
Expand Down

0 comments on commit 41fcf5d

Please sign in to comment.