-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI: Migrate to async client and enhance API support #219
base: main
Are you sure you want to change the base?
Conversation
Major changes: - Migrate to async OpenAI client to support query cancellation and timeouts - Add client caching using global dictionary (GD) to improve performance - Migrate to using raw responses to minimize type conversions and improve performance - Add comprehensive support for all OpenAI API parameters - Add support for client create/destroy methods Implementation details: - Replace sync OpenAI client with AsyncOpenAI for better control flow - Implement client caching in GD to reuse connections - Add query cancellation support using asyncio - Remove list_models and embed function implementations from openai.py to consolidate API handling - Move functionality directly into the SQL functions for consistency - Return raw API responses to minimize conversions - Add complete OpenAI API parameter support across all functions - Standardize parameter naming with leading underscore - Update OpenAI and tiktoken package versions Package updates: - openai: 1.44.0 -> 1.51.2 - tiktoken: 0.7.0 -> 0.8.0 Breaking changes: - Functions now return raw JSON responses instead of parsed objects - Functions marked as parallel unsafe due to HTTP API constraints - Parameter names now prefixed with underscore for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this PR! I did a preliminary check and have a bunch of questions. I just want to understand the motivation/reasoning behind some decisions. Also we'll have to decide on the json vs vector return of some of these functions. I think you are right we'll need both sets of functions. Let me ask some of my colleagues about the naming conventions we want to use here.
projects/extension/ai/openai.py
Outdated
|
||
return openai.AsyncOpenAI(**client_kwargs) | ||
|
||
def get_or_create_client(plpy, GD: Dict[str, Any], api_key: str = None, api_key_name: str = None, base_url: str = None) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any number showing that creating the client is expensive (and thus worth it to store in GD)?. Does this allow connection reuse or something? And if it's the latter then how/when do connections get closed? Is there a keepalive timeout.
Storing the client in GD seems like a good amount of complexity and I'd like to find out what we are gaining/loosing for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, benchmarks here: #116 (comment)
Also note, that there is a known issue with the 2nd (and 3rd...etc) call to the client for the api has some extra 40ms delay that doesn't happen when I have this code running outside of a pl/python environment (noted in the thread above). I really should have mentioned that directly in the PR. will edit to mention that fact that it needs to be identified. Once that is fixed, the benchmark numbers should look much better.
Even with the above issue, this is still much faster, and lower CPU than the original implementation where we recreate the client.
Note specifically the CPU reduction. Recreating the client is heavy on CPU, I know this from past projects but the benchmarks also bare this out.
I believe the connection is closed after the request is complete, and the client becomes ready for the next call. If the request is cancelled early, we attempt to kill things gracefully.
@cevian Alright, so when removing the _underscore prefix we will need to make changes to the Let me know what to go with. |
@Tostino I believe we used |
Fix naming conflicts with Postgres reserved words. Reverted parallel safety changes. Went from unsafe -> safe for functions. Reverted function volatility changes.
@cevian Alright, all changes made. Also noticed that when I rebased things I had accidentally committed changes to the ai--0.4.0.sql file, so I got that reverted. This still has the performance problem we need to dig into before it's merged, but at least any of the other issues can be discussed and fixed in the meantime. |
@Tostino I am still not convinced we need |
I'm good with that solution. Seems to solve the problem statement I was originally trying to solve. Will get it done tonight. |
… client_extra_args parameter to all relevant functions that interact with the client (other than the `_simple` function that I think needs a rethinking).
Well...some kid went and ripped out my neighborhoods internet interconnection wiring last night. Was slightly delayed. Tested to make sure the client_extra_args were being passed through properly, and it seems to be with my initial "kick the tires" tests. |
I should have a little time to try and figure out that 2nd run issue this week (or at least attempt, i'm not a Python dev so I am not used to the profiling tools in this space). @cevian Is there anything else you see that needs attention at this point? |
Major changes:
Implementation details:
Package updates:
Breaking changes:
Known issues: