Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received "Response payload is not completed" when reading response #4581

Closed
pablogsal opened this issue Feb 17, 2020 · 83 comments
Closed

Received "Response payload is not completed" when reading response #4581

pablogsal opened this issue Feb 17, 2020 · 83 comments
Assignees
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided

Comments

@pablogsal
Copy link

pablogsal commented Feb 17, 2020

🐞 Describe the bug

A ClientPayloadError: Response payload is not completed exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl succeeds all the time.

💡 To Reproduce

Do lots of requests against an endpoint of the GitHub API (it happens from time to time).

💡 Expected behavior
The response is correctly parsed.

📋 Logs/tracebacks

  File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
    return response.status, response.headers, await response.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
    self._body = await self.content.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
    block = await self.readany()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
    await self._wait('readany')
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read:

blocks =  [b'[\n  {\n    "id": 2941521,\n    "node_id": "MDU6TGFiZWwyOTQxNTIx",\n    "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n    "name": "skip-issue",\n    "color": "000000",\n    "default": false\n  }\n]\n']

n=-1

As you can see the blocks contain the whole payload (the full JSON) but aiohttp still complains about the payload not being completed.

📋 Your version of the Python

$ python --version
Python 3.8.1

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

📋 Additional context

I suspect this has to be with this comment:

aiohttp/aiohttp/streams.py

Lines 352 to 356 in 1740c99

# TODO: should be `if` instead of `while`
# because waiter maybe triggered on chunk end,
# without feeding any data
while not self._buffer and not self._eof:
await self._wait('read')

as this is where is failing and the situation seems that is the one described in the TODO.

@pablogsal pablogsal added the bug label Feb 17, 2020
@pablogsal
Copy link
Author

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

@webknjaz
Copy link
Member

I'm curious if it's somehow related to #4549.

@WH-2099
Copy link
Contributor

WH-2099 commented Jul 20, 2020

I'm curious if it's somehow related to #4549.

I encountered these two problems at the same time. #4549 #4581
Python: 3.8.4
aiohttp: 3.6.2

I think there do have some relaionship.
And I think the core conditions are frequent visits to a single domain name.

@WH-2099
Copy link
Contributor

WH-2099 commented Jul 21, 2020

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

Change the while to if seems no help.

jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

This should speed up the CIBA tests and OIDC test modules where we use
dcr can hence can proceed without any alias. Unfortunately the CIBA
tests had to have CIBA disabled because Authlete's CIBA simulated
authentication device doesn't seem to cope with multiple parallel

Reduces the time to run oidcc-basic-certification-test-plan from 313 seconds
to 63 seconds on my local machine - most of which comes from the test
that sleeps for 30 seconds before reusing an auth code.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	scripts/run-test-plan.py
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	scripts/run-test-plan.py
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 9, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
@tildedave
Copy link

tildedave commented Jun 23, 2021

Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.

My code streams the proxy body back to upstream, using code roughly like:

response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
   await response.write(chunk)
await response.write_eof()
return response

What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.

Investigation:

  • Calling await gunicorn_response.content.wait_eof() prior to calling iter_chunked reduces the instance of the error, but it still happens occasionally.
  • Setting the Connection: keep-alive header in the reverse proxy's request to gunicorn resolves this issue. This is an acceptable workaround for me.
  • Changing the while to an if in the source code in conjunction with calling wait_eof() prior to iterating over the body resolves the issue.

jogu added a commit to openid-certification/conformance-suite that referenced this issue Nov 21, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	.gitlab-ci.yml
#	.gitlab-ci/run-tests.sh
#	scripts/run-test-plan.py
#	test/Dockerfile
jogu added a commit to openid-certification/conformance-suite that referenced this issue Nov 21, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	.gitlab-ci.yml
#	.gitlab-ci/run-tests.sh
#	scripts/run-test-plan.py
#	test/Dockerfile
@asvetlov asvetlov self-assigned this Dec 14, 2021
@thehesiod
Copy link
Contributor

we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce

@ghost
Copy link

ghost commented Feb 10, 2022

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12
aiohttp: 3.8.1
yarl: 1.7.2
multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

@a-gn
Copy link

a-gn commented May 12, 2022

I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.

My server failed around the same time as the error. The backoff library re-ran the request, and I got the ClientPayloadError. Is this expected? Shouldn't losing the server raise ConnectionError instead?

@aralroca
Copy link

I have the same issue with Python 3.10

@joelcorporan

This comment was marked as duplicate.

@junbaibai0719
Copy link

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

I am also getting that error using iter_chunked when downloading files from S3

martinpitt added a commit to cockpit-project/console.dot that referenced this issue Sep 19, 2022
Instead, pass through the upstream Connection header, like we do with
Upgrade: already.

This helps to avoid aio-libs/aiohttp#4581
martinpitt added a commit to cockpit-project/console.dot that referenced this issue Sep 19, 2022
This is conceptually broken -- the browser should decide when it wants
to upgrade a connection to websocket, and we should just pass that
through (like we do with `Upgrade:` already). This is also what the real
3scale does, so more faithfully reproduces its behaviour.

This also helps to avoid aio-libs/aiohttp#4581
jelly pushed a commit to cockpit-project/console.dot that referenced this issue Sep 20, 2022
This is conceptually broken -- the browser should decide when it wants
to upgrade a connection to websocket, and we should just pass that
through (like we do with `Upgrade:` already). This is also what the real
3scale does, so more faithfully reproduces its behaviour.

This also helps to avoid aio-libs/aiohttp#4581
@bdraco
Copy link
Member

bdraco commented Apr 2, 2024

It took about 45 minutes to happen on my system

  + Exception Group Traceback (most recent call last):
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
  |     asyncio.run(main())
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/aiohttp/client_proto.py", line 94, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 284, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 799, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    | 
    | The above exception was the direct cause of the following exception:
    | 
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 357, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
    +------------------------------------

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2024

@bdraco it was inconsistent for me too. I'm mostly curious what you think of what I wrote in #4581 (comment). What I suggested should work in theory but I'm worried it's only addressing the consequence and not the root cause.

@bdraco
Copy link
Member

bdraco commented Apr 2, 2024

I'm poking at it a bit to see if I can find the root cause but given the long test cycle it may take a while and I'm time limited for the next few days

@bdraco
Copy link
Member

bdraco commented Apr 2, 2024

Looks like sometimes I get

  + Exception Group Traceback (most recent call last):
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
  |     asyncio.run(main())
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 393, in read
    |     await self._wait("read")
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 311, in _wait
    |     with self._timer:
    |   File "/Users/bdraco/aiohttp/aiohttp/helpers.py", line 735, in __exit__
    |     raise asyncio.TimeoutError from None
    | TimeoutError
    +------------------------------------

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2024

Interesting.. I never got that timeout. But it makes sense given that there seems to be a race between something unsetting ResponseHandler._parser / ResponseHandler._payload and ResponseHandler.connection_lost() being called.

hail-ci-robot pushed a commit to hail-is/hail that referenced this issue May 11, 2024
…r retrying (#14545)

The treatment of `ClientPayloadError` as a sometimes transient error was
originally made in response to [an existing
issue](aio-libs/aiohttp#4581) in aiohttp that
can cause transient errors on the client that are difficult to
distinguish from a real broken server. What's in `main` matched exactly
on the error message, but that error message has [since
changed](aio-libs/aiohttp@dc38630)
to include more information, breaking our transient error handling. This
change relaxes the requirement of the error response string to fix
transient error handling for our current version of `aiohttp`.

I wish I had a better approach. `ClientPayloadError` can also be thrown
in the case of malformed data, so I am reticent to treat it as always
transient, but we could perhaps make it a `limited_retries_error` and
avoid inspecting the error message.
@jlarkin-mojo
Copy link

jlarkin-mojo commented May 13, 2024

Sorry - was there an official fix for this? I am able to reproduce when utilizing aiohttp through aiobotocore.

Basically:

  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/de_common/aws/s3.py", line 229, in get_object
    body = await response["Body"].read()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiobotocore/response.py", line 56, in read
    chunk = await self.__wrapped__.content.read(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
            ^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>

Anyone have any suggestions or thoughts? I'm using aiohttp==3.9.5

@toffeegryphon
Copy link

toffeegryphon commented May 31, 2024

I think I have a reliably reproducible example that is derived from @gtedesco-r7's example
https://gist.github.com/toffeegryphon/6a94a7883923f80c2259bbb297bb0d3b

Python 3.12
aiohttp 3.9.3

A ClientPayloadError(TransferEncodingError) is raised when the server closes the connection (or rather the client realizes the connection is closed) before the client reads the end-of-response 0 size chunk, and a ServerDisconnectedError is raised when the server closes the connection after the end-of-response chunk is read (specifically, when a TCP connector is reused). A 104 Peer Disconnected seems to be raised occasionally, I believe it happens when the client realizes the connection is closed while the response buffer is being read, but I haven't looked too closely at whether this is true or not.

@andrewkruse
Copy link

andrewkruse commented Jun 28, 2024

Was there traction on this (or one of the other incredibly similar errors)? I suspect I'm getting some form of throttle that is causing things to get weird after spending a while reading all sorts of random posts from folks. I can't seem to get a better error message no matter how I try, probably because one of the libraries in the middle eats it or something.

I'm downloading a 124.5MB file from Google Cloud using the gcloud-aio library and fastapi with a streaming response. And I intermittently get this exception. My download takes roughly 4-5 seconds when working correctly. But if I try it a few times in a row (say like 5 or 6), it errors out instead of working.

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>. SSLError(1, '[SSL] record layer failure (_ssl.c:2580)')

I suspect this is a me problem in some manner, but I can't seem to figure out what issue I'm trying to fix.

Here's my fastapi endpoint:

async def download(request: Request) -> StreamingResponse:
    async def generate_response():
        # we need to make a new session inside this generator
        #  because the one on the request context is closed when we return
        #  the streaming response, and then it goes byebye and fails
        #  the streaming download out with a partial file error.
        async with aiohttp.ClientSession(raise_for_status=True) as blob_session:
            new_storage_client = Storage(session=blob_session)
            async with await new_storage_client.download_stream(bucket, file) as stream:
                yield await stream.read()
    return StreamingResponse(generate_response())

@toffeegryphon
Copy link

For our situation, we've figured out that yeah, it was legitimately the connection being broken from the server side (I.e. No clear bug from aiohttp). We had Nginx as a load balancer/reverse proxy in front of our servers, and whenever a reload happens, the workers would get killed some time after, and any long running requests would get forcefully terminated (and correctly reported as an error).

@Dreamsorcerer
Copy link
Member

@webknjaz @bdraco Think one of you could try the reproducer again on master? I'm suspecting it might be fixed by the chunking issue I sorted out recently.

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 22, 2024
@bdraco
Copy link
Member

bdraco commented Aug 23, 2024

I can give it a shot this weekend when I'll have the ability to let it run for an extended period (assuming my reattempt at a long travel day tomorrow doesn't go as poorly as my attempt on Wednesday and I actually get to my destination)

@jedrazb
Copy link

jedrazb commented Aug 23, 2024

Hey! I’m encountering a similar issue where the connection appears to be terminated on the server, resulting in a ClientPayloadError(TransferEncodingError) on the client side. This occurs rarely and is difficult to reproduce.

Would simply retrying the request on the client side be sufficient to handle this error, or would it be necessary to recreate the aiohttp session (or implement any special error-handling logic)? Will aiohttp attempt to reconnect to the server automatically upon a retry?

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Aug 23, 2024

The project you've referenced above appears to be using 3.10.3, the fix I mentioned above is in 3.10.4. If you're still experiencing problems on the latest release, you'll have to try and figure out a reproducer or some way to log the raw chunks. However, you should verify whether you are using the compiled C parser (e.g. wheels from pip) or the Python parser (which can be forced with AIOHTTP_NO_EXTENSIONS=1 envvar) first.

If the server is sending malformed responses, then the behaviour appears correct, if you send another request to retry, then aiohttp should close the bad connection and create a new connection for the next request. If the responses are correct, then we'd need to find the cause of the issue here.

@webknjaz
Copy link
Member

@Dreamsorcerer I'm afraid I'm at capacity these days, so I'll defer to @bdraco for now.

@bdraco
Copy link
Member

bdraco commented Aug 24, 2024

I started running the test client again. I'll leave it for a few hours and report back

while python3 4581_client.py; do :; done

@bdraco
Copy link
Member

bdraco commented Aug 24, 2024

No longer reproducible on 3.10.5. I'd say this is fixed

@bdraco bdraco closed this as completed Aug 24, 2024
ntk148v added a commit to iosevka-webfonts/update-centre that referenced this issue Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided
Projects
None yet
Development

No branches or pull requests