-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Voyage AI vectorizer integration #256
Merged
Merged
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
a4df434
feat: add Voyage AI vectorizer integration
JamesGuthrie 27ea0be
chore: address review feedback
JamesGuthrie c36c8de
fix: add ApiKeyMixin to VoyageAI
smoya 5113516
fix: pass api key to voyageai API
smoya 0a845a6
chore: address review feedback
JamesGuthrie 5fe98ae
Merge remote-tracking branch 'origin/main' into jg/voyageai-vectorizer
JamesGuthrie 30d4566
chore: regenerate test outputs
JamesGuthrie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I basically copy-pasted this from |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
# Use pgai with Voyage AI | ||
|
||
This page shows you how to: | ||
|
||
- [Configure pgai for Voyage AI](#configure-pgai-for-voyage-ai) | ||
- [Add AI functionality to your database](#usage) | ||
- [Follow advanced AI examples](#advanced-examples) | ||
|
||
## Configure pgai for Voyage AI | ||
|
||
Most pgai functions require an [Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys). | ||
smoya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- [Handle API keys using pgai from psql](#handle-api-keys-using-pgai-from-psql) | ||
- [Handle API keys using pgai from python](#handle-api-keys-using-pgai-from-python) | ||
|
||
### Handle API keys using pgai from psql | ||
|
||
The api key is an [optional parameter to pgai functions](https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html). | ||
You can either: | ||
|
||
* [Run AI queries by passing your API key implicitly as a session parameter](#run-ai-queries-by-passing-your-api-key-implicitly-as-a-session-parameter) | ||
* [Run AI queries by passing your API key explicitly as a function argument](#run-ai-queries-by-passing-your-api-key-explicitly-as-a-function-argument) | ||
|
||
#### Run AI queries by passing your API key implicitly as a session parameter | ||
|
||
To use a [session level parameter when connecting to your database with psql](https://www.postgresql.org/docs/current/config-setting.html#CONFIG-SETTING-SHELL) | ||
to run your AI queries: | ||
|
||
1. Set your Voyage AI key as an environment variable in your shell: | ||
```bash | ||
export VOYAGE_API_KEY="this-is-my-super-secret-api-key-dont-tell" | ||
``` | ||
1. Use the session level parameter when you connect to your database: | ||
|
||
```bash | ||
PGOPTIONS="-c ai.voyage_api_key=$VOYAGE_API_KEY" psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>" | ||
``` | ||
|
||
1. Run your AI query: | ||
|
||
`ai.voyage_api_key` is set for the duration of your psql session, you do not need to specify it for pgai functions. | ||
|
||
```sql | ||
SELECT * FROM ai.voyageai_embed('voyage-3-lite', 'sample text to embed'); | ||
``` | ||
|
||
#### Run AI queries by passing your API key explicitly as a function argument | ||
|
||
1. Set your Voyage AI key as an environment variable in your shell: | ||
```bash | ||
export VOYAGE_API_KEY="this-is-my-super-secret-api-key-dont-tell" | ||
``` | ||
|
||
2. Connect to your database and set your api key as a [psql variable](https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-VARIABLES): | ||
|
||
```bash | ||
psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>" -v voyage_api_key=$VOYAGE_API_KEY | ||
``` | ||
Your API key is now available as a psql variable named `voyage_api_key` in your psql session. | ||
|
||
You can also log into the database, then set `voyage_api_key` using the `\getenv` [metacommand](https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-META-COMMAND-GETENV): | ||
|
||
```sql | ||
\getenv voyage_api_key VOYAGE_API_KEY | ||
``` | ||
|
||
3. Pass your API key to your parameterized query: | ||
```sql | ||
SELECT * | ||
FROM ai.voyageai_embed('voyage-3-lite', 'sample text to embed', api_key=>$1) | ||
ORDER BY created DESC | ||
\bind :voyage_api_key | ||
\g | ||
``` | ||
|
||
Use [\bind](https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-META-COMMAND-BIND) to pass the value of `voyage_api_key` to the parameterized query. | ||
|
||
The `\bind` metacommand is available in psql version 16+. | ||
|
||
4. Once you have used `\getenv` to load the environment variable to a psql variable | ||
you can optionally set it as a session-level parameter which can then be used explicitly. | ||
```sql | ||
SELECT set_config('ai.voyage_api_key', $1, false) IS NOT NULL | ||
\bind :voyage_api_key | ||
\g | ||
``` | ||
|
||
```sql | ||
SELECT * FROM ai.voyageai_embed('voyage-3-lite', 'sample text to embed'); | ||
``` | ||
|
||
### Handle API keys using pgai from python | ||
|
||
1. In your Python environment, include the dotenv and postgres driver packages: | ||
|
||
```bash | ||
pip install python-dotenv | ||
pip install psycopg2-binary | ||
``` | ||
|
||
1. Set your Voyage AI key in a .env file or as an environment variable: | ||
```bash | ||
VOYAGE_API_KEY="this-is-my-super-secret-api-key-dont-tell" | ||
DB_URL="your connection string" | ||
``` | ||
|
||
1. Pass your API key as a parameter to your queries: | ||
|
||
```python | ||
import os | ||
from dotenv import load_dotenv | ||
|
||
load_dotenv() | ||
|
||
VOYAGE_API_KEY = os.environ["VOYAGE_API_KEY"] | ||
DB_URL = os.environ["DB_URL"] | ||
|
||
import psycopg2 | ||
|
||
with psycopg2.connect(DB_URL) as conn: | ||
with conn.cursor() as cur: | ||
# pass the API key as a parameter to the query. don't use string manipulations | ||
cur.execute("SELECT * FROM ai.voyageai_embed('voyage-3-lite', 'sample text to embed', api_key=>%s)", (VOYAGE_API_KEY,)) | ||
records = cur.fetchall() | ||
``` | ||
|
||
Do not use string manipulation to embed the key as a literal in the SQL query. | ||
|
||
|
||
## Usage | ||
|
||
This section shows you how to use AI directly from your database using SQL. | ||
|
||
- [Embed](#embed): generate [embeddings](https://platform.openai.com/docs/guides/embeddings) using a | ||
smoya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
specified model. | ||
|
||
### Embed | ||
|
||
Generate [embeddings](https://platform.openai.com/docs/guides/embeddings) using a specified model. | ||
smoya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- Request an embedding using a specific model: | ||
|
||
```sql | ||
SELECT ai.voyageai_embed | ||
( 'voyage-3-lite' | ||
, 'the purple elephant sits on a red mushroom' | ||
); | ||
``` | ||
|
||
The data returned looks like: | ||
|
||
```text | ||
voyageai_embed | ||
-------------------------------------------------------- | ||
[0.005978798,-0.020522336,...-0.0022857306,-0.023699166] | ||
(1 row) | ||
``` | ||
|
||
- Pass an array of text inputs: | ||
|
||
```sql | ||
SELECT ai.voyageai_embed | ||
( 'voyage-3-lite' | ||
, array['Timescale is Postgres made Powerful', 'the purple elephant sits on a red mushroom'] | ||
); | ||
``` | ||
|
||
- Specify the input type | ||
|
||
The Voyage AI API allows setting the `input_type` to `"document"`, or | ||
`"query"`, (or unset). Correctly setting this value should enhance retrieval | ||
quality: | ||
|
||
```sql | ||
SELECT ai.voyageai_embed | ||
( 'voyage-3-lite' | ||
, 'A query' | ||
, input_type => 'query' | ||
); | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
import voyageai | ||
from typing import Optional, Generator, Union | ||
|
||
DEFAULT_KEY_NAME = "VOYAGE_API_KEY" | ||
|
||
|
||
def embed( | ||
model: str, | ||
input: Union[list[str]], | ||
api_key: str, | ||
input_type: Optional[str] = None, | ||
truncation: Optional[bool] = None, | ||
) -> Generator[tuple[int, list[float]], None, None]: | ||
client = voyageai.Client(api_key=api_key) | ||
args = {} | ||
if truncation is not None: | ||
args["truncation"] = truncation | ||
response = client.embed(input, model=model, input_type=input_type, **args) | ||
smoya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if not hasattr(response, "embeddings"): | ||
return None | ||
for idx, obj in enumerate(response.embeddings): | ||
yield idx, obj |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,5 @@ tiktoken==0.7.0 | |
ollama==0.2.1 | ||
anthropic==0.29.0 | ||
cohere==5.5.8 | ||
backoff==2.2.1 | ||
backoff==2.2.1 | ||
voyageai==0.3.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,4 @@ install_requires = | |
anthropic==0.29.0 | ||
cohere==5.5.8 | ||
backoff==2.2.1 | ||
voyageai==0.3.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
------------------------------------------------------------------------------- | ||
-- voyageai_embed | ||
-- generate an embedding from a text value | ||
-- https://docs.voyageai.com/reference/embeddings-api | ||
create or replace function ai.voyageai_embed | ||
( model text | ||
, input_text text | ||
, input_type text default null | ||
jgpruitt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
, api_key text default null | ||
, api_key_name text default null | ||
) returns @extschema:vector@.vector | ||
as $python$ | ||
#ADD-PYTHON-LIB-DIR | ||
import ai.voyageai | ||
import ai.secrets | ||
api_key_resolved = ai.secrets.get_secret(plpy, api_key, api_key_name, ai.voyageai.DEFAULT_KEY_NAME, SD) | ||
for tup in ai.voyageai.embed(model, [input_text], api_key=api_key_resolved): | ||
return tup[1] | ||
$python$ | ||
language plpython3u immutable parallel safe security invoker | ||
set search_path to pg_catalog, pg_temp | ||
; | ||
|
||
------------------------------------------------------------------------------- | ||
-- voyageai_embed | ||
-- generate embeddings from an array of text values | ||
-- https://docs.voyageai.com/reference/embeddings-api | ||
create or replace function ai.voyageai_embed | ||
( model text | ||
, input_texts text[] | ||
, api_key text default null | ||
, api_key_name text default null | ||
, input_type text default null | ||
jgpruitt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) returns table | ||
( "index" int | ||
, embedding @extschema:vector@.vector | ||
) | ||
as $python$ | ||
#ADD-PYTHON-LIB-DIR | ||
import ai.voyageai | ||
import ai.secrets | ||
api_key_resolved = ai.secrets.get_secret(plpy, api_key, api_key_name, ai.voyageai.DEFAULT_KEY_NAME, SD) | ||
for tup in ai.voyageai.embed(model, input_texts, api_key=api_key_resolved): | ||
yield tup | ||
$python$ | ||
language plpython3u immutable parallel safe security invoker | ||
set search_path to pg_catalog, pg_temp | ||
; |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should not set the
truncate
tofalse
in all of our examples unless we explicitly want to show the behaviour when is set tofalse
. Otherwise, we might confuse users, who 99.9% of the time will want this to betrue
as default.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done