Skip to content

Commit

Permalink
docs: development docs
Browse files Browse the repository at this point in the history
Signed-off-by: usamoi <usamoi@outlook.com>
  • Loading branch information
usamoi committed Jan 4, 2024
1 parent eb44c26 commit fddce6f
Show file tree
Hide file tree
Showing 9 changed files with 205 additions and 128 deletions.
9 changes: 1 addition & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ More details at [./docs/comparison-pgvector.md](./docs/comparison-pgvector.md)
- [Searching](./docs/searching.md)
- [Comparison with pgvector](./docs/comparison-pgvector.md)
- [Why not a specialty vector database?](./docs/comparison-with-specialized-vectordb.md)
- [Development](./docs/development.md)

For users, we recommend you to try pgvecto.rs using our pre-built docker image, by running

Expand All @@ -52,14 +53,6 @@ docker run \
-d tensorchord/pgvecto-rs:pg16-v0.1.13
```

## Development with envd

For developers, you could use [envd](https://github.com/tensorchord/envd) to set up the development environment with one command. It will create a docker container and install all the dependencies for you.

```sh
pip install envd
envd up
```
## Contributing

We need your help! Please check out the [issues](https://github.com/tensorchord/pgvecto.rs/issues).
Expand Down
35 changes: 17 additions & 18 deletions build.envd
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,23 @@ def build():
shell("zsh")
install.apt_packages(
name=[
"lsb-release",
"gnupg",
"tzdata",
"build-essential",
"libpq-dev",
"libssl-dev",
"pkg-config",
"gcc",
"libreadline-dev",
"flex",
"bison",
"libxml2-dev",
"libxslt-dev",
"libxml2-utils",
"xsltproc",
"zlib1g-dev",
"ccache",
"clang",
'bison',
'build-essential',
'ccache',
'flex',
'gcc',
'git',
'gnupg',
'libreadline-dev',
'libssl-dev',
'libxml2-dev',
'libxml2-utils',
'libxslt-dev',
'lsb-release',
'pkg-config',
'tzdata',
'xsltproc',
'zlib1g-dev'
]
)
runtime.environ(extra_path=["/home/envd/.cargo/bin"])
Expand Down
2 changes: 1 addition & 1 deletion crates/service/src/prelude/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ ADVICE: Check if dimensions and scalar type of the vector is matched with the in
#[error("\
IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. \
Please check the full PostgreSQL log to get more information.\
Please check the full PostgreSQL log to get more information. Please read `https://github.com/tensorchord/pgvecto.rs/blob/main/docs/configuration.md`.\
")]
Ipc,
#[error("\
Expand Down
14 changes: 14 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Configuration

## Logging

By default, you cannot capture all pgvecto.rs logs. pgvecto.rs starts a background worker process for indexing, and it prints logs to standard error. To capture them, you need to set `logging_collector` to `on`. You can get more information from [PostgreSQL document about logging collector](https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOGGING-COLLECTOR).

You can set `logging_collector` to `on` with the following command:

```sh
psql -U postgres -c 'ALTER SYSTEM SET logging_collector = on;'
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
docker restart pgvecto-rs-demo # for pgvecto.rs running in docker
```
141 changes: 141 additions & 0 deletions docs/development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Development

## Environment

You can setup development environment simply using `envd`. It will create a docker container and install all the dependencies for you.

```sh
pip install envd
git clone https://github.com/tensorchord/pgvecto.rs.git # or `git clone git@github.com:tensorchord/pgvecto.rs.git`
cd pgvecto.rs
envd up
```

Or you can setup development environment following these steps manually:

1. Install base dependency.

```sh
sudo apt install -y \
bison \
build-essential \
ccache \
flex \
gcc \
git \
gnupg \
libreadline-dev \
libssl-dev \
libxml2-dev \
libxml2-utils \
libxslt-dev \
lsb-release \
pkg-config \
tzdata \
xsltproc \
zlib1g-dev
```

2. Clone the Repository.

```sh
git clone https://github.com/tensorchord/pgvecto.rs.git # or `git clone git@github.com:tensorchord/pgvecto.rs.git`
cd pgvecto.rs
```

3. Install PostgreSQL and its headers. We assume you may install PostgreSQL 15. Feel free to replace `15` to any other major version number you need.

```sh
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y --no-install-recommends libpq-dev postgresql-15 postgresql-server-dev-15
```

4. Install clang-16. We do not support other versions of clang.

```sh
sudo sh -c 'echo "deb http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-16 main" >> /etc/apt/sources.list'
wget --quiet -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y --no-install-recommends clang-16
```

5. Install Rust. The following command will install Rustup, the Rust toolchain installer for your user. Do not install rustc using package manager.

```sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

6. Install cargo-pgrx.

```sh
cargo install cargo-pgrx@$(grep 'pgrx = {' Cargo.toml | cut -d '"' -f 2)
cargo pgrx init --pg15=/usr/lib/postgresql/15/bin/pg_config
```

7. The following command is helpful if you are struggling with permission issues.

```sh
sudo chmod 777 /usr/share/postgresql/15/extension/
sudo chmod 777 /usr/lib/postgresql/15/lib/
```

### Cross Compilation

Assuming that you build target for aarch64 in a x86_64 host environment, you can follow these steps:

1. Install cross compilation toolchain.

```sh
sudo apt install crossbuild-essential-arm64
```

2. Get PostgreSQL header files on target architecture.

```sh
apt download postgresql-server-dev-15:arm64
```

3. Set right linker and sysroot for Rust by adding the following section to the end of `~/.cargo/config.toml`.

```toml
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"

[env]
BINDGEN_EXTRA_CLANG_ARGS_aarch64_unknown_linux_gnu = "-isystem /usr/aarch64-linux-gnu/include/ -ccc-gcc-name aarch64-linux-gnu-gcc"
```

## Debug

Debug information included in the compiled binary even in release mode so you can always use `gdb` for debugging.

For a debug build, backtrace is printed when a thread in background worker process panics, but not for a session process error. For a release build, backtrace is never printed. But if you set environment variable `RUST_BACKTRACE` to `1`, all backtraces are printed. It's recommended for you to debug a release build with the command `RUST_BACKTRACE=1 cargo pgrx run --release`.

## Pull Request

### Version

pgvecto.rs uses `pg_vectors` directory under PostgreSQL data directory for storage. To reduce the unnecessary rebuilding indexes when upgrade, we record version number of persistent data. If you modify the structure of persistent data, you need to bump the `VERSION` (if it's a breaking change) or `SOFT_VERSION` (if a newer version can still read old data).

The version number is saved in these two files:

1. `/crates/service/src/worker/metadata.rs` (if the structure of persistent data you modified is outside an index).
2. `/crates/service/src/instance/metadata.rs` (if the structure of persistent data you modified is inside an index).

## Release

These steps are needed for a release:

1. Get a new version number. Let's say it's `99.99.99` and its former version number is `98.98.98`.
2. Push these changes to `main` branch.
* Modify the latest version number in `/README.md` and `/docs/installation.md` to `99.99.99`.
* Use `cargo pgrx schema` to generate a schema script and upload it to `/sql/vectors--99.99.99.sql`.
* Write a schema update script and upload it to `/sql/vectors--98.98.98--99.99.99.sql`.
3. Manually trigger `Release` CI.

These steps are needed for a prerelease:

1. Get a new version number. Let's say it's `99.99.99-alpha`.
2. Manually trigger `Release` CI with checkbox `prerelease` checked.
2 changes: 1 addition & 1 deletion docs/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ You could create a table with the following SQL.

CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding vector(3) NOT NULL -- 3 dimensions
embedding vector(3) NOT NULL -- 3 dimensions
);
```

Expand Down
109 changes: 13 additions & 96 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,130 +17,47 @@ CREATE EXTENSION vectors;

To achieve full performance, please mount the volume to pg data directory by adding the option like `-v $PWD/pgdata:/var/lib/postgresql/data`

You can configure PostgreSQL by the reference of the parent image in https://hub.docker.com/_/postgres/.
You can configure PostgreSQL by [the reference of the parent image](https://hub.docker.com/_/postgres/).

## Install from source

Install base dependency.
1. Please read [Development](./development.md) to setup a development environment.

```sh
sudo apt install -y \
build-essential \
libpq-dev \
libssl-dev \
pkg-config \
gcc \
libreadline-dev \
flex \
bison \
libxml2-dev \
libxslt-dev \
libxml2-utils \
xsltproc \
zlib1g-dev \
ccache \
clang \
git
```

Install Rust. The following command will install Rustup, the Rust toolchain installer for your user. Do not install rustc using package manager.

```sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

Install PostgreSQL and its headers. We assume you may install PostgreSQL 15. Feel free to replace `15` to any other major version number you need.

```sh
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install libpq-dev postgresql-15 postgresql-server-dev-15
```

Install clang-16. We do not support other versions of clang.

```sh
sudo sh -c 'echo "deb http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-16 main" >> /etc/apt/sources.list'
wget --quiet -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install clang-16
```

Clone the Repository. Note the following commands are executed in the cloned repository directory.

```sh
git clone https://github.com/tensorchord/pgvecto.rs.git
cd pgvecto.rs
```

Install cargo-pgrx.

```sh
cargo install cargo-pgrx@$(grep 'pgrx = {' Cargo.toml | cut -d '"' -f 2)
cargo pgrx init --pg15=/usr/lib/postgresql/15/bin/pg_config
```

Install pgvecto.rs.
2. Install pgvecto.rs with the following command:

```sh
cargo pgrx install --sudo --release
```

Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.
3. Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.

```sh
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors.so"'
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
service postgresql restart # for pgvecto.rs running in envd
```

You need restart the PostgreSQL cluster.

```sh
sudo systemctl restart postgresql.service
```

Connect to the database and enable the extension.
4. Connect to the database and enable the extension.

```sql
DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;
```

### Cross compilation

Assuming that you build target for aarch64 in a x86_64 host environment, you need to set right linker and sysroot for Rust.

```sh
sudo apt install crossbuild-essential-arm64
```

Add the following section to the end of `~/.cargo/config.toml`.

```toml
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"

[env]
BINDGEN_EXTRA_CLANG_ARGS_aarch64_unknown_linux_gnu = "-isystem /usr/aarch64-linux-gnu/include/ -ccc-gcc-name aarch64-linux-gnu-gcc"
```

## Install from release

Download the deb package in the release page, and type `sudo apt install vectors-pg15-*.deb` to install the deb package.
1. Download the deb package in the release page, and type `sudo apt install vectors-pg15-*.deb` to install the deb package.

Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.
2. Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.

```sh
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors.so"'
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
```

You need restart the PostgreSQL cluster.

```sh
sudo systemctl restart postgresql.service
```

Connect to the database and enable the extension.
3. Connect to the database and enable the extension.

```sql
DROP EXTENSION IF EXISTS vectors;
Expand Down
5 changes: 3 additions & 2 deletions docs/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ You can delete the folder with this command:

```shell
rm -rf $(psql -U postgres -tAqX -c $'SELECT CONCAT(CURRENT_SETTING(\'data_directory\'), \'/pg_vectors\');')
# You need restart the PostgreSQL cluster to take effects.
sudo systemctl restart postgresql.service # for pgvecto.rs running with systemd
docker restart pgvecto-rs-demo # for pgvecto.rs running in docker
```

If you are using Docker, you can just delete `pg_vectors` folder under the volume directory too.

You need to restart PostgreSQL.

* Reindex.

You can list all indexes that needed to be reindexed with this command:
Expand Down
Loading

0 comments on commit fddce6f

Please sign in to comment.