Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI: Migrate to async client and enhance API support #219

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Tostino
Copy link

@Tostino Tostino commented Nov 13, 2024

Major changes:

  • Migrate to async OpenAI client to support query cancellation and timeouts
  • Add client caching using global dictionary (GD) to improve performance
  • Migrate to using raw responses to minimize type conversions and improve performance
  • Add comprehensive support for all OpenAI API parameters
  • Add support for client create/destroy methods

Implementation details:

  • Replace sync OpenAI client with AsyncOpenAI for better control flow
  • Implement client caching in GD to reuse connections
  • Add query cancellation support using asyncio
  • Remove list_models and embed function implementations from openai.py to consolidate API handling
  • Move functionality directly into the SQL functions for consistency
  • Return raw API responses to minimize conversions
  • Add complete OpenAI API parameter support across all functions
  • Standardize parameter naming with leading underscore
  • Update OpenAI and tiktoken package versions

Package updates:

  • openai: 1.44.0 -> 1.51.2
  • tiktoken: 0.7.0 -> 0.8.0

Breaking changes:

  • Functions now return raw JSON responses instead of parsed objects
  • Functions marked as parallel unsafe due to HTTP API constraints
  • Parameter names now prefixed with underscore to reduce naming conflicts

Known issues:

  • Inconsistent performance inside the plpython environment.
    • The first call to the endpoint is quick (3ms), and then every call after it is very delayed (40ms). Need to figure out what is happening here. I cannot reproduce outside of plpython.

Major changes:
- Migrate to async OpenAI client to support query cancellation and timeouts
- Add client caching using global dictionary (GD) to improve performance
- Migrate to using raw responses to minimize type conversions and improve performance
- Add comprehensive support for all OpenAI API parameters
- Add support for client create/destroy methods

Implementation details:
- Replace sync OpenAI client with AsyncOpenAI for better control flow
- Implement client caching in GD to reuse connections
- Add query cancellation support using asyncio
- Remove list_models and embed function implementations from openai.py to consolidate API handling
- Move functionality directly into the SQL functions for consistency
- Return raw API responses to minimize conversions
- Add complete OpenAI API parameter support across all functions
- Standardize parameter naming with leading underscore
- Update OpenAI and tiktoken package versions

Package updates:
- openai: 1.44.0 -> 1.51.2
- tiktoken: 0.7.0 -> 0.8.0

Breaking changes:
- Functions now return raw JSON responses instead of parsed objects
- Functions marked as parallel unsafe due to HTTP API constraints
- Parameter names now prefixed with underscore for consistency
@Tostino Tostino requested a review from a team as a code owner November 13, 2024 16:54
Copy link
Collaborator

@cevian cevian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this PR! I did a preliminary check and have a bunch of questions. I just want to understand the motivation/reasoning behind some decisions. Also we'll have to decide on the json vs vector return of some of these functions. I think you are right we'll need both sets of functions. Let me ask some of my colleagues about the naming conventions we want to use here.

projects/extension/sql/idempotent/001-openai.sql Outdated Show resolved Hide resolved

return openai.AsyncOpenAI(**client_kwargs)

def get_or_create_client(plpy, GD: Dict[str, Any], api_key: str = None, api_key_name: str = None, base_url: str = None) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any number showing that creating the client is expensive (and thus worth it to store in GD)?. Does this allow connection reuse or something? And if it's the latter then how/when do connections get closed? Is there a keepalive timeout.

Storing the client in GD seems like a good amount of complexity and I'd like to find out what we are gaining/loosing for it.

Copy link
Author

@Tostino Tostino Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, benchmarks here: #116 (comment)

Also note, that there is a known issue with the 2nd (and 3rd...etc) call to the client for the api has some extra 40ms delay that doesn't happen when I have this code running outside of a pl/python environment (noted in the thread above). I really should have mentioned that directly in the PR. will edit to mention that fact that it needs to be identified. Once that is fixed, the benchmark numbers should look much better.

Even with the above issue, this is still much faster, and lower CPU than the original implementation where we recreate the client.

Note specifically the CPU reduction. Recreating the client is heavy on CPU, I know this from past projects but the benchmarks also bare this out.

I believe the connection is closed after the request is complete, and the client becomes ready for the next call. If the request is cancelled early, we attempt to kill things gracefully.

projects/extension/sql/idempotent/001-openai.sql Outdated Show resolved Hide resolved
projects/extension/sql/idempotent/001-openai.sql Outdated Show resolved Hide resolved
projects/extension/sql/idempotent/001-openai.sql Outdated Show resolved Hide resolved
projects/extension/sql/idempotent/001-openai.sql Outdated Show resolved Hide resolved
@Tostino
Copy link
Author

Tostino commented Nov 14, 2024

@cevian Alright, so when removing the _underscore prefix we will need to make changes to the user argument that will differ from the OpenAI API / python client. It is an invalid argument name, along with text which you changed to text_input already.

Let me know what to go with.

@cevian
Copy link
Collaborator

cevian commented Nov 14, 2024

@Tostino I believe we used openai_user before. Let's stick with that.

Fix naming conflicts with Postgres reserved words.

Reverted parallel safety changes. Went from unsafe -> safe for functions.

Reverted function volatility changes.
@Tostino
Copy link
Author

Tostino commented Nov 14, 2024

@cevian Alright, all changes made. Also noticed that when I rebased things I had accidentally committed changes to the ai--0.4.0.sql file, so I got that reverted.

This still has the performance problem we need to dig into before it's merged, but at least any of the other issues can be discussed and fixed in the meantime.

@cevian
Copy link
Collaborator

cevian commented Nov 14, 2024

@Tostino I am still not convinced we need ai.openai_client_create( as a public function. Can we just pass any client options we need as a jsonb to the other functions via a client_extra_args parameter?

@Tostino
Copy link
Author

Tostino commented Nov 14, 2024

I'm good with that solution. Seems to solve the problem statement I was originally trying to solve. Will get it done tonight.

… client_extra_args parameter to all relevant functions that interact with the client (other than the `_simple` function that I think needs a rethinking).
@Tostino
Copy link
Author

Tostino commented Nov 15, 2024

Well...some kid went and ripped out my neighborhoods internet interconnection wiring last night. Was slightly delayed.

Tested to make sure the client_extra_args were being passed through properly, and it seems to be with my initial "kick the tires" tests.

@Tostino
Copy link
Author

Tostino commented Nov 19, 2024

I should have a little time to try and figure out that 2nd run issue this week (or at least attempt, i'm not a Python dev so I am not used to the profiling tools in this space).

@cevian Is there anything else you see that needs attention at this point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants