-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aiohttp hangs when uploading a file during streaming download #10169
Comments
Does it happen when using I think I saw something similar recently that looked like some kind of race condition that wasn't an issue when using the context manager. |
Do you mean the request context manager like so: import asyncio
import aiohttp
async def run():
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(10)) as session:
async with session.get("http://localhost:4566/bucket/ae4bbfcdc47f450aa8557abefeba4a5ct") as response:
i = 0
async for chunk in response.content.iter_chunked(1024):
i += 1
print(f"chunk number {i}")
if i >= 900:
print("Streamed, time to upload")
# It hangs awaiting this
async with session.put("http://localhost:4566/bucket/output/some_file", data=b"") as response:
pass
print("Uploaded")
asyncio.run(run()) It still happens when using it like this. |
Right, that's not related then to this issue. |
Ok, I've been doing some more digging and adding debug messages here and there and it seems the inconsistency is related to connection reuse. If the download finishes quickly, the connection will be released to the pool early. Then, the upload will try to reuse this connection and it hangs as described above. If the download is slower, the upload will trigger before the connection is released to the pool. The upload will then use a new connection and return it to the pool. Most of the times, this makes it so the upload on the next iteration picks up this new connection that was just released and so the process continues normally. I think the problem is that when the connection is released to the pool (on download eof) while the iteration is still ongoing, the connection is somehow unusable until iteration ends. So if a request takes it out of the pool and blocks the iteration, you get a deadlock (iteration waiting for upload to complete, upload waiting for iteration to complete). |
Hmm, I don't think there should be any relation between them that could cause that. Will have to add some debugging in to figure out exactly what code is reached and what condition is being waited on. Another possibility is that the server is misbehaving and not handling connection reuse correctly.. |
Well, I think I found something but I am lacking context on how all of this works, so this explanation might not be entirely accurate. It seems the connection can be returned to the pool with the protocol paused. The In that case, the new request will get a connection in paused state and will be unable to read from it until resumed. I did a tentative fix that works for my example, by unpausing the protocol when receiving and EOF, e.g: adding this.
This seems to work OK but I'm not sure it's the right approach. Another alternative could be to unpause the protocol directly in pool code, either when returning or taking out a connection from the pool. For example, unpausing in |
Excellent sleuthing. Either of those sounds like they might be a good solution. First thing we need though is a regression test. Hopefully with that knowledge, we can create a complete test to add to the test suite now. |
these are the best bugs, thanks for taking the time to help track this one down! |
Nice catch! |
Thank you everyone! Indeed, this is one of the fun bugs to track down. At least, I had some fun with it :) Anyway, I prepared a PR with the proposed fix and a couple of tests: #10171 |
Describe the bug
I've got some code that downloads a large archive file from S3 using aiobotocore, extracts it and uploads each file to another key in the same bucket, all using iterators to work on small chunks of data. We noticed that the process got stuck some times (maybe ~30% of the time) while doing so. No timeout or exceptions were raised but the process never finished.
Originally reported as an aiobotocore issue but while debugging it, we managed to get to this minimum reproducing example using only aiohttp. Although in this example a timeout is being raised and the original it didn't (or it might have been swallowed by our code).
To Reproduce
Run a localstack server as follows:
Then run a streaming download with concurrent uploads:
Sometimes, this will get stuck and eventually timeout after 5 minutes (or whatever the timeout is set to).
Expected behavior
The script finishes successfully.
Logs/tracebacks
Python Version
3.12.7
aiohttp Version
multidict Version
4.7.6
propcache Version
0.2.1
yarl Version
1.18.3
OS
Tested in Arch Linux and Ubuntu 24.04
Related component
Client
Additional context
A bit more context in the original issue. The important points are:
Code of Conduct
The text was updated successfully, but these errors were encountered: