-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add async support #146
Add async support #146
Conversation
* Fix some syntax errors
* overload output type depending on stream literal (openai#142) * Bump to v22 * [numpy] change version (openai#143) * [numpy] change version * update comments * no version for numpy * Fix timeouts (openai#137) * Fix timeouts * Rename to request_timeout and add to readme * Dev/hallacy/request timeout takes tuples (openai#144) * Add tuple typing for request_timeout * imports * [api_requestor] Log request_id with response (openai#145) * Only import wandb as needed (openai#146) Co-authored-by: Felipe Petroski Such <felipe@openai.com> Co-authored-by: Henrique Oliveira Pinto <hponde@openai.com> Co-authored-by: Rachel Lim <rachel@openai.com>
Love this addition, hope it gets merged. Wanted to add that I seem to have been getting
|
* This is due to a lack of an async __del__ method
Thanks for catching that @MaxTretikov ! This should be resolved in the latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @Andrew-Chen-Wang, this was quite the effort!
I've left a few comments, mostly on the APIRequestor
since it's the most interesting part imo.
An obvious issue when looking at this patch is that it introduces a ton of duplication but that's usually what happens when wanting to provide both a sync and async library without requiring callers to use async_to_sync
or the like. Another option I've seen others take is to offer different classes entirely for the aio
version of the library so that the current sync ones don't need to change but we'd basically end up with even more duplication...
Maybe there's a future world where this library does become entirely async (it's mostly IO bound anyway) and callers who want sync then need to run the task to completion in an event loop on a thread of their own. I'm not sure, but for now I personally feel like this approach is totally fine.
openai/api_requestor.py
Outdated
data, content_type = requests.models.RequestEncodingMixin._encode_files( | ||
files, data | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aiohttp
has MultipartWriter
for this use case. Not as trivial as just passing a dictionary of files as with requests
but I think that'd be the proper way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the preparation of the data will be fairly different between requests and aiohttp causing more duplication of code. I'm planning of essentially copying _encode_files
(albeit the actual appending of files is different). Is that alright?
Also, feel free to add my fork as a remote and commit freely. I've enabled maintainers to be able to commit to my fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I took a quick look at MultipartWriter
and the implementation of _encode_files
and while it should work for us here that's a fair amount of somewhat brittle code that could be easy to mess up. Maybe adding a comment/TODO above this line explaining what is going on is good enough for now.
openai/__init__.py
Outdated
aiosession: ContextVar[Optional["ClientSession"]] = ContextVar( | ||
"aiohttp-session", default=None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought this was just a global but after learning more about Context
and ContextVar
I realized that it enabled the thing I was scared this would prevent: using different sessions, by making a copy of the current context, setting the new session and running the code through the new context, thus having the openai module picking the new context without impacting other code that will still get the other one.
This might be worth putting as an example in the readme since it was not clear to me that this was possible without first reading more about ContextVar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already addressed in the README I believe in the async section. Please add suggestions. I've also added a comment in the source code itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably fine I agree. I think it's mostly a matter of knowing how ContextVar
works which I expect people dealing to async code to do (and that I didn't initially :))
@@ -21,9 +21,10 @@ | |||
"openpyxl>=3.0.7", # Needed for CLI fine-tuning data preparation tool xlsx format | |||
"numpy", | |||
"typing_extensions", # Needed for type hints for mypy | |||
"aiohttp", # Needed for async support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option could be to move this to extras_require
under an async
feature and then we would raise an exception in api_requestor.py
if we can't import aiohttp
. Although it's also fine to always require it tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, it would be better, but typing would be a pain for mypy users. In redis-py, we install async-timeout, even if you only need the sync Redis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair enough.
Thanks for your time and the review @ddeville ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, there might be a couple of small pieces to update but I think this is a very good (huge) first step.
Let's do this!
Would love for @hallacy to take a look before we merge though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! Lets merge it
thanks @ddeville and @hallacy !! in the future to possibly prevent further duplication of code, I highly recommend two resources:
|
Thanks for the links @Andrew-Chen-Wang this is super interesting. From a quick glance, it looks like it works more or less like this:
And then you would use it like that
(so basically Am I reading this right? The only issue I can think of is that the methods on |
You got it right! To answer the question on typing, in redis-py, we make a type Union with Awaitable and result type. However the reason I make these suggestions is because the example usage you've got it quite unwieldy, hence a preference for the current implementation in this PR. A better implementation would be to have something similar to the Redis class, like ApiRequestor but named OpenAI, that could call commands like OpenAI().Completion(...) But that's essentially a complete rewrite that I'm. not quite for at the moment. Would prefer to have the PR merged as it seems like a complete rewrite would be necessary (looking at the stripe sdk rn which this lib is forked from; for reference: stripe/stripe-python#715 (comment)) |
When is the release of this? |
@@ -84,17 +140,33 @@ def download( | |||
|
|||
if typed_api_type in (ApiType.AZURE, ApiType.AZURE_AD): | |||
base = cls.class_url() | |||
url = "/%s%s/%s/content?api-version=%s" % ( | |||
url = "/%s%s/%s?api-version=%s" % ( | |||
cls.azure_api_prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why the "content" part was removed?
* overload output type depending on stream literal (openai#142) * Bump to v22 * [numpy] change version (openai#143) * [numpy] change version * update comments * no version for numpy * Fix timeouts (openai#137) * Fix timeouts * Rename to request_timeout and add to readme * Dev/hallacy/request timeout takes tuples (openai#144) * Add tuple typing for request_timeout * imports * [api_requestor] Log request_id with response (openai#145) * Only import wandb as needed (openai#146) Co-authored-by: Felipe Petroski Such <felipe@openai.com> Co-authored-by: Henrique Oliveira Pinto <hponde@openai.com> Co-authored-by: Rachel Lim <rachel@openai.com>
* Add async support * Fix aiohttp requests * Fix some syntax errors * Close aiohttp session properly * This is due to a lack of an async __del__ method * Fix code per review * Fix async tests and some mypy errors * Run black * Add todo for multipart form generation * Fix more mypy * Fix exception type * Don't yield twice Co-authored-by: Damien Deville <damien@openai.com>
Fixes #98
Adds asyncio support for API. Does not support CLI (does not make sense).
I haven't tested file upload. aiohttp does not include the files parameter like requests, so I'm using requests' private utility function in the ApiRequestor
Usage (notice the
a
prefix, used in CPython lib, Django, and other libs):Installation of aiohttp can include speedups that aren't included in setup.py. aiohttp is a required package, but it doesn't have to be. We'd just need to remove the typing.