Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: OpenAI Bad Gateway results in Error in on_retry: asyncio.run() cannot be called from a running event loop (coroutine 'AsyncRunManager.on_retry' was never awaited) inside openai.acompletion_with_retry #8462

Closed
maspotts opened this issue Jul 29, 2023 · 14 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@maspotts
Copy link
Contributor

Issue you'd like to raise.

I just saw a novel error, which appears to be triggered by a failed OpenAI API call (inside an asynchronous block) which is causing an asyncio.run() inside an asyncio.run(). Error pasted below. Is this my (user) error? Or possibly a problem with the acompletion_with_retry() implementation?

2023-07-29 05:53:14,838 INFO     message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=None response_code=502
2023-07-29 05:53:14,838 INFO     error_code=502 error_message='Bad gateway.' error_param=None error_type=cf_bad_gateway message='OpenAI API error received' stream_error=False
2023-07-29 05:53:14,839 WARNING  Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Bad gateway. {"error":{"code":502,"message":"Bad gateway.","param":null,"type":"cf_bad_gateway"}} 502 {'error': {'code': 502, 'message': 'Bad gateway.', 'param': None, 'type': 'cf_bad_gateway'}} <CIMultiDictProxy('Date': 'Sat, 29 Jul 2023 05:53:14 GMT', 'Content-Type': 'application/json', 'Content-Length': '84', 'Connection': 'keep-alive', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin', 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT', 'Server': 'cloudflare', 'CF-RAY': '7ee3120dab9f1084-ORD', 'alt-svc': 'h3=":443"; ma=86400')>.
2023-07-29 05:53:14,839 ERROR    Error in on_retry: asyncio.run() cannot be called from a running event loop
/usr/local/python-modules/tenacity/__init__.py:338: RuntimeWarning: coroutine 'AsyncRunManager.on_retry' was never awaited
  self.before_sleep(retry_state)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Suggestion:

No response

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jul 29, 2023
@walkward
Copy link

walkward commented Aug 2, 2023

@hinthornw Looks like this error was likely introduced in #8053. Any ideas?

@ryanstout
Copy link

@maspotts I'm seeing similar, what type of code are you running that is causing this? Thanks

@kylrth
Copy link
Contributor

kylrth commented Aug 6, 2023

The problem is that create_base_retry_decorator tries to asyncio.run something in the before_sleep callback, which breaks things when this is all happening inside an agenerate call.

I'm not familiar enough with the retry decorator design to fix this myself, but it seems like acompletion_with_retry (async) needs an async version of _create_retry_decorator. 🤷

@ryanstout
Copy link

@kylrth Thanks for the info, yea, I think I'm seeing similar. What kind of code are you running that causes it. For me it's doing a MapReduce.

@bent-verbiage
Copy link

+1 on the issue. I got it on a chain.arun() where OpenAI returned a 502.

`Error in on_retry: asyncio.run() cannot be called from a running event loop

Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: HTTP code 502 from API (<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>cloudflare</center>
</body>
</html>
).
```

@kylrth
Copy link
Contributor

kylrth commented Aug 8, 2023

I see it with just ChatOpenAI.agenerate when OpenAI returns a 502.

ryanstout added a commit to ryanstout/mrfreeze that referenced this issue Aug 10, 2023
change types on sessoins.$id.$pageNum.tsx
@NikitaSemenovAiforia
Copy link

NikitaSemenovAiforia commented Aug 14, 2023

Catched this warning on pytest recently. Don't know if this 502 or what.
Screenshot from 2023-08-14 17-44-04

@ShantanuNair
Copy link
Contributor

@hwchase17 Any chance someone's looking at this? It's a source of high billables potentially.

@ShantanuNair
Copy link
Contributor

@maspotts I'm looking into this too, do you see the same issue on the latest version? @kylrth can you expand a bit more on why the asyncio.run is problematic here? I am running into this issue with agenerate, and notice my retries don't run after 4/8/10 seconds as they should - they are run after about 6-7 minutes, and I'm wondering if fixing this bug may fix my issue of retries from 502s failing. Maybe I can take a go at tackling this.

@ShantanuNair
Copy link
Contributor

@kylrth Thanks for the info, yea, I think I'm seeing similar. What kind of code are you running that causes it. For me it's doing a MapReduce.

Me too. A mapreduce chain via analayzeDocumentsChain

@kylrth
Copy link
Contributor

kylrth commented Sep 13, 2023

I think this issue is closed by #8659. @hinthornw ?

Could some of you experiencing the original error please test on v0.0.252 or later?

@ShantanuNair
Copy link
Contributor

@kylrth Hah, I was just running through the same PR. Looks like the way tenacity's retry decorator works takes care of the async/sync switch. Can I ask - can you verify that on recent langchain that retries do indeed work after X (2/4/8) seconds? For me it's hanging for 6-7 minutes between retries even though it prints our retrying in X time.

@ShantanuNair
Copy link
Contributor

So important notes regarding this issue from my investigation:

  1. If using Async calls, when you receive a 502 bad gateway, it will timeout after the whole 600s. Align request_timeout Behavior in Async and Non-Async APIs openai/openai-python#387 Needs to be fixed in openai-python. We need read timeouts and not a total timeout.
  2. After stalling your chain for the entirety of 10 minutes, you WILL BE BILLED for no generation, besides it absolutely destroying experience by forcing one part of the chain to wait an entire 10 minutes.

@nfcampos
Copy link
Collaborator

@hinthornw this was fixed in #8659 right? If so, lets close it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

8 participants