-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI: Migrate to async client and enhance API support #219
Open
Tostino
wants to merge
68
commits into
timescale:main
Choose a base branch
from
Tostino:fix_openai_redux
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
411edc6
feat(openai): Migrate to async client and enhance API support
Tostino 9484652
Remove underscore prefix from all arguments.
Tostino ce17af4
Revert accidental changes to ai--0.4.0.sql
Tostino 9c7c824
Whitespace fixes...darn IDE.
Tostino c8aacd7
Remove client create/destroy functions from the public interface. Add…
Tostino 0146781
chore(main): release pgai 0.3.0 (#277)
github-actions[bot] 77f3863
chore: release extension 0.6.0 (#285)
JamesGuthrie 35e2fc8
chore: increment extension version to 0.6.1-dev
jgpruitt 027b3f4
feat: allow superusers to create vectorizers on any table
jgpruitt bd83165
fix: handle empty `PG_BIN` (#286)
JamesGuthrie 9f3b646
chore: increment extension lib version
jgpruitt 1e7f5a5
feat: allow superusers to call _vectorizer_create_dependencies
jgpruitt aac3d83
fix: host networking is not supported on macos
jgpruitt ee86d35
fix: schema qualify type definitions, casts, and operators
jgpruitt 74db2ba
docs: voyage ai quickstart (#276)
Askir aabbc68
docs: fix psql command and pstgres data volume in quickstarts (#293)
Askir 19119e5
docs: fix formatting of table for voyageai (#294)
Askir 113cc48
chore: include latest tag on releases (#295)
adolsalamanca 7b2995b
feat: allow vectorizers to be granted to public
jgpruitt b22a77a
chore: configure latest tag in docker metadata step (#296)
JamesGuthrie 1e226e8
docs: guide users to clone the latest release for installation
jgpruitt c3a2221
docs: use named volumes in quickstart guides (#300)
JamesGuthrie 38c7b50
docs: fix RAG SQL example in readme
jgpruitt 20714ff
chore: reorganize readme
cevian 4098316
chore: readme fixes from PR comments
cevian 2b41c9a
chore: more changes to readme
cevian dbac246
feat: pull missing ollama models (#301)
JamesGuthrie 5441f2d
chore: fix broken pgai build by pinning hatchling (#308)
JamesGuthrie 3f9736a
chore: support uv in extension install, use for dev (#309)
JamesGuthrie 7d4c8ee
feat: add a semantic catalog for db objs and example sql
jgpruitt df399f8
feat: add ai.find_relevant_sql semantic catalog function
jgpruitt 18ee335
feat: add ai.find_relevant_obj() functions
jgpruitt 9edacfb
ci: pgspot chokes on valid code. disabling for now
jgpruitt 8984cf9
fix: ignore ollama.base_url in test
jgpruitt f17388e
feat: only find relevant db objs that user has privs to
jgpruitt cf71602
feat: allow find_relevant_obj to be restricted to a given obj type
jgpruitt 4fd12a6
feat: return dist and add max_dist filter to semantic_catalog funcs
jgpruitt 405a208
chore: clean up event triggers to only update columns strictly required
jgpruitt 56ecf53
chore: add foreign key constraints to semantic catalog on vectorizer
jgpruitt 32b0eb7
chore: reorder arguments for semantic catalog functions
jgpruitt 77df293
chore: support multiple objtype filters in find_relevant_obj()
jgpruitt 85c9244
feat: add vectorizer_embed convenience function
jgpruitt 6578edc
chore: make an immutable version of vectorizer_embed
jgpruitt 5c17d89
chore: rename semantic_catalog.name to semantic_catalog.catalog_name
jgpruitt 1f6d1a8
fix: exclude python system packages for versioned extension (#310)
JamesGuthrie 7b4a916
fix: deprecation warning on re.split
MasterOdin 1d42906
chore: remove pip caching
JamesGuthrie 3c2a6a5
docs: remove openai mention from quickstart, fix opclasses in hnsw in…
Askir 295c0f0
feat: construct a prompt for text-to-sql using relevant desc
jgpruitt 17e3d28
feat: add a text_to_sql function
jgpruitt 77673ee
chore: split embedders in individual files (#315)
smoya b7573b7
feat: load api keys from db in self hosted vectorizer (#311)
Askir 423e8fd
test: spawn the ollama container through testcontainers (#318)
smoya ad81577
docs: add simple embedding evaluation script for pgai Vectorizer (#312)
jackyliang 1e44ba2
docs: add Voyage AI evaluation code (#321)
jackyliang 4f530f2
fix: openai wrapping text-to-sql query in markdown (#324)
MasterOdin 39fcf93
docs: clarify 'destination' argument (#262)
smoya 0230509
feat: add sqlalchemy vectorizer_relationship (#265)
Askir a4298c3
Code for evaluating open source embedding models (#305)
ihis-11 8c5baf7
test: use psycopg over psycopg2 for sqlalchemy tests (#326)
MasterOdin 89039b2
chore: register postgres_params custom pytest.mark (#327)
MasterOdin 6d89b42
feat(openai): Migrate to async client and enhance API support
Tostino c368831
Remove underscore prefix from all arguments.
Tostino 76fe0cd
Revert accidental changes to ai--0.4.0.sql
Tostino 88f32b8
Whitespace fixes...darn IDE.
Tostino b8772f6
Remove client create/destroy functions from the public interface. Add…
Tostino ab3e9b3
Merge remote-tracking branch 'origin/fix_openai_redux' into fix_opena…
Tostino 6236f03
Fix breakage with secrets.
Tostino File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
openai==1.44.0 | ||
tiktoken==0.7.0 | ||
openai==1.51.2 | ||
tiktoken==0.8.0 | ||
ollama==0.2.1 | ||
anthropic==0.29.0 | ||
cohere==5.5.8 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any number showing that creating the client is expensive (and thus worth it to store in GD)?. Does this allow connection reuse or something? And if it's the latter then how/when do connections get closed? Is there a keepalive timeout.
Storing the client in GD seems like a good amount of complexity and I'd like to find out what we are gaining/loosing for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, benchmarks here: #116 (comment)
Also note, that there is a known issue with the 2nd (and 3rd...etc) call to the client for the api has some extra 40ms delay that doesn't happen when I have this code running outside of a pl/python environment (noted in the thread above). I really should have mentioned that directly in the PR. will edit to mention that fact that it needs to be identified. Once that is fixed, the benchmark numbers should look much better.
Even with the above issue, this is still much faster, and lower CPU than the original implementation where we recreate the client.
Note specifically the CPU reduction. Recreating the client is heavy on CPU, I know this from past projects but the benchmarks also bare this out.
I believe the connection is closed after the request is complete, and the client becomes ready for the next call. If the request is cancelled early, we attempt to kill things gracefully.