invalid character in header #5269

zhaoxuan · 2020-11-22T13:32:04Z

🐞 Describe the bug

[2020-11-22 21:24:53 +0800] [55277] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'

💡 To Reproduce

I do not have any idea about it, the CPU pressure was not hight.

aiohttp == 3.7.3
gunicorn == 20.0.4
uvloop == 0.14.0

📋 Your version of the Python

Python == 3.6.9

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna-ssl, chardet, attrs, yarl, async-timeout
Required-by: aiohttp-devtools

$ python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires:
Required-by: yarl, aiohttp

$ python -m pip show yarl
Name: yarl
Version: 1.5.1
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna
Required-by: aiohttp

The text was updated successfully, but these errors were encountered:

webknjaz · 2020-11-22T18:39:15Z

At glance, this looks like it may be a duplicate of #5220

webknjaz · 2020-11-22T18:41:15Z

@zhaoxuan this needs a reproducer. Without it, we'll have no idea how to trigger this behavior and won't even be able to confirm whether it's a bug.

asvetlov · 2020-11-22T20:26:08Z

Agree with @webknjaz
If a reproducer exists we can check it and figure out what's going on.
Otherwise, we should close the issue after some period and pray that #5220 fixes this problem too.

zhaoxuan · 2020-11-23T06:09:31Z

@webknjaz it was weird, I added some log in aiohttp/web_protocol.py:314 but I got nothing, so how can I get the error message?

webknjaz · 2020-11-23T10:39:50Z

@zhaoxuan we need to have a simple aiohttp app that reproduces the issue posted here. Even if it's a "hello world". The simpler the better, with no other deps than aiohttp.
Also, we need to know what HTTP query triggers this behavior.

P.S. You cannot log what feed_data() returns because it raises an exception meaning that the control flow doesn't get to return at all. And you cannot debug the pyx module as easily because it would need rebuilding.

derlih · 2020-11-24T09:10:45Z

@webknjaz may be we could add a debug level logging to feed_data() method to make debugging of such issues easier in the future?

asvetlov · 2020-11-24T09:42:15Z

I'm curious to see the concrete proposal for debug logging.

zhaoxuan · 2020-11-24T09:43:01Z

@webknjaz My app contains IO operation (MySQL Redis Memcache), so a simple aiohttp app could not reproduce this issue.

from aiohttp import web

from aiocache import Cache
from aiocache.serializers import PickleSerializer

async def test_handle(request: web.Request):
    text = 'test'
    mc = request.app['config']['memcache']

    value = await mc.get(text)

    if bool(value):
        print("in memcache: {}".format(value))
    else:
        await mc.set(text, '123', 60)

    return web.Response(text=text)

app = web.Application()
app['config'] = {}

mc = Cache(Cache.MEMCACHED, serializer=PickleSerializer(), endpoint='127.0.0.1', port=11211, pool_size=64)
app['config']['memcache'] = mc

app.router.add_get('/test', test_handle)

if __name__ == '__main__':
    web.run_app(app)

I am not sure it would work.

zhaoxuan · 2020-12-08T06:50:33Z

There are still unresolved issues for this error.

[2020-12-08 14:47:30 +0800] [36619] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36602] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36608] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'

webknjaz · 2021-03-19T23:04:23Z

@zhaoxuan we need a reproducer without aiocache.

slavkoja · 2021-03-21T12:04:49Z

I do not know if it is related, but i got the same error, with the public server. I want to download all images in process of MyHeritage family tree migration. after i extract all URL i want to use aiohttp client to download them. Here is simplified example with only one (real) URL:

import asyncio

import aiohttp

async def adown(session, url):
    async with session.get(url) as resp:
        if resp.status == 200:
            await response.read()

    return url, destf

async def main(urls):
    tasks = []
    async with aiohttp.ClientSession() as http:
        for url in urls:
            tasks.append(asyncio.create_task(adown(http, url)))

        for result in await asyncio.gather(*tasks, return_exceptions=True):
            print(result, type(result))

URL = "https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg"

asyncio.run(main([URL]))

With result (original in one line):

python3 "aioheritage.py"
400, message='invalid character in header', 
    url=URL('https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg')
    <class 'aiohttp.client_exceptions.ClientResponseError'>

I can download all these URL with wget, it reports these headers (for this one):

wget -qSO /dev/null https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg
  HTTP/1.1 200 OK
  Etag: "8389ecdb6-20beb-52ab876d5a8f3"
  Last-Modified: Mon, 01 Feb 2016 17:10:39 GMT
  Content-Type: image/jpeg
  Content-Length: 127209
  Cache-Control: max-age=315261854, public
  Expires: Tue, 18 Mar 2031 08:44:33 GMT
  Date: Sun, 21 Mar 2021 12:00:19 GMT
  Set-Cookie: visid_incap_319037=T+vq1wp2TdO9gAVTYLyL+FM1V2AAAAAAQUIPAAAAAABtOQePeUjSDNNJKchrmEJ/; expires=Sun, 20 Mar 2022 21:39:11 GMT; HttpOnly; path=/; Domain=.myheritageimages.com
  Set-Cookie: incap_ses_727_319037=o0vJdI0mfF9lb7jP8tIWClM1V2AAAAAAdJOjF9EvO1s6QdSnjUNI4Q==; path=/; Domain=.myheritageimages.com
  Set-Cookie: ___utmvmzyufoSw=GntjHtylUBq; path=/; Max-Age=900
  Set-Cookie: ___utmvazyufoSw=pKt\001hMfL; path=/; Max-Age=900
  Set-Cookie: ___utmvbzyufoSw=tZg      XdgOSalF: KtB; path=/; Max-Age=900
  X-CDN: Imperva
  X-Iinfo: 11-98177944-0 0CNN RT(1616328018934 107) q(0 -1 -1 -1) r(0 -1)

Debian testing, Python 3.9.2, aiohttp 3.7.4

kolamor · 2021-05-30T18:35:47Z

I had such when header \x00Date: ...
httpx
httpx.RemoteProtocolError: illegal header line: bytearray(b'\x00Date: Sun, 30 May 2021 18:54:19 GMT')

ppmdo · 2021-05-31T09:13:28Z

Can confirm that the solution stated here works: #5355 (comment)

Basically:
set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

yotamgod · 2021-05-31T09:17:43Z

Can confirm that the solution stated here works: #5355 (comment)

Basically:
set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

I've seen this solution too and tried looking into it.
However, I haven't been able to find what other extensions are disabled, and what functionality is lost by doing so.
Personally I was using aiohttp for production, and this is not an option I can properly consider without knowing exactly what the implications are.
Anyone know?

ppmdo · 2021-05-31T09:26:14Z

As far as I can tell, no functionality is lost. Nonetheless, performance is degraded as mentioned here: https://github.com/aio-libs/aiohttp/blob/09ac1cbb4c07b4b3e1872374808512c9bc52f2ab/CHANGES/3828.feature

Using super dumb timeit tests I've seen a drop of about 30% in performance.

import asyncio
import os

os.environ.setdefault('AIOHTTP_NO_EXTENSIONS', '0')

from aiohttp import ClientSession


url = 'https://www.java.com/'

async def dorequest(url):
    async with ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

def main():
    result = asyncio.run(dorequest(url))

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("main()", setup="from __main__ import main", number=150))

g66-gopanear · 2021-06-14T06:13:18Z

Can confirm that the solution stated here works: #5355 (comment)

Basically:
set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

This works for me if I set is as a environment variable. However, It does not matter whether I set AIOHTTP_NO_EXTENSIONS=1 or AIOHTTP_NO_EXTENSIONS=0 even if AIOHTTP_NO_EXTENSIONS=whatever text, it still works only if I have it as an environment variable

webknjaz · 2021-06-19T08:19:38Z

Using super dumb timeit tests I've seen a drop of about 30% in performance.

HTTP parser is the bottleneck in web apps and so it's implemented as an optional C-extension most of which is coming from http_parser — a lib that used to be the default in NodeJS. It is now deprecated and should be replaced by its successor lhttp which is not yet implemented (#5364).

This is the reason for the performance drop.

Wolfhound905 · 2021-09-28T14:18:02Z

Has anyone found a workaround for this? I am making a request to an API I myself can not fix. But this works fine with requests lib. How could the "invalid character in header" be fixed?

asvetlov · 2021-10-30T10:43:37Z

Set AIOHTTP_NO_EXTENSIONS=1

slavkoja · 2021-10-30T11:11:49Z

I read somewhere, that it can leads to 30% performance drop, and this is significant drop, thus not solution at all, only workaround for simple cases.

asvetlov · 2021-10-30T13:16:42Z

Man, you have a broken HTTP server. Slow processing is better than nothing in such case, isn't it?

Dreamsorcerer · 2022-08-08T19:12:10Z

Encountered this at work. There are a few things which make this particularly frustrating.

First, the 400 in the exception has nothing to do with the response, and is confusing, thinking that the server sent a 400 HTTP response, when it was actually a 200 response.

Second, I had to hack aiohttp to even be able to print out the raw response which was causing the issue.

Third, even after getting the raw response, it was very difficult to find what was actually wrong in the response. After quite some time, I found a Set-Cookie header with a \x01 byte in the middle of it.

Figuring out this information should be much quicker, so we can report the problems to the server owner.

For servers where this is not possible, there should probably also be a way to discard any headers with invalid characters and continue parsing the rest of the response.

I think changing the 400 output of the error message should be trivially fixed by us, but everything else may need some changes in llhttp.

pallas · 2022-08-08T22:39:28Z

Are you setting the lenient headers flag?

Dreamsorcerer · 2022-08-15T19:41:46Z

I'm rather short on time, so if anybody wants to pick this up and try setting the lenient option, that would be great.
nodejs/llhttp#174 (comment)

EnriqueOrtiz27 · 2022-09-05T16:29:35Z

I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1 as an environment variable before running the code also fixed this issue for me. Thanks.

Dreamsorcerer · 2022-09-05T16:45:47Z

I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1 as an environment variable before running the code also fixed this issue for me. Thanks.

Because it then doesn't use llhttp, which means it is a lot slower. So, not recommended for production.

Dreamsorcerer · 2023-07-22T14:16:49Z

I've added some information to the exceptions to highlight what character is causing a parse error (in 3.8.5).
However, we won't be enabling lenient headers currently as this will reintroduce the GHSA-45c4-8wx5-qw6w vulnerability. I'll follow this up with llhttp, and if the situation changes we'll make any recommended changes. Currently, it appears there is nothing we can do.

Dreamsorcerer · 2023-08-06T20:05:14Z

After reviewing this again, it should be safe to enable lenient headers in the response parser (i.e. client side only). This should be enabled in the next release (#7490). The parser will revert to being strict when using dev mode (python -X dev), which should help developers detect mistakes in their servers.

zhaoxuan added the bug label Nov 22, 2020

webknjaz mentioned this issue Nov 22, 2020

Server returns 400: "invalid character in chunk size header" for valid request #5220

Closed

webknjaz added the reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR label Nov 22, 2020

smola mentioned this issue Dec 17, 2020

ClientResponseError on header values with non-printable characters #5355

Closed

This comment has been minimized.

Sign in to view

robotman321 mentioned this issue Mar 9, 2022

ComEd Hourly Pricing API is Returning a 400 Error home-assistant/core#57857

Closed

Dreamsorcerer mentioned this issue Aug 8, 2022

Better handling of invalid character errors nodejs/llhttp#174

Closed

Dreamsorcerer mentioned this issue Oct 26, 2022

Fix unicode errors #7044

Merged

5 tasks

Dreamsorcerer mentioned this issue Jan 12, 2023

Incorrect parsing of HTTP status line in the C implementation #7165

Closed

1 task

Dreamsorcerer mentioned this issue Feb 12, 2023

[security] server should not trust client-provided content-length #7208

Closed

1 task

Dreamsorcerer mentioned this issue Jun 27, 2023

Regression with llhttp 8.1.1: test_http_response_parser_code_under_100[c-parser-pyloop] #7327

Closed

1 task

Dreamsorcerer closed this as completed Jul 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

invalid character in header #5269

invalid character in header #5269

zhaoxuan commented Nov 22, 2020 •

edited

Loading

webknjaz commented Nov 22, 2020

webknjaz commented Nov 22, 2020

asvetlov commented Nov 22, 2020

zhaoxuan commented Nov 23, 2020

webknjaz commented Nov 23, 2020

derlih commented Nov 24, 2020

asvetlov commented Nov 24, 2020

zhaoxuan commented Nov 24, 2020

zhaoxuan commented Dec 8, 2020

webknjaz commented Mar 19, 2021

slavkoja commented Mar 21, 2021

This comment has been minimized.

This comment has been minimized.

kolamor commented May 30, 2021 •

edited

Loading

ppmdo commented May 31, 2021

yotamgod commented May 31, 2021

ppmdo commented May 31, 2021 •

edited

Loading

g66-gopanear commented Jun 14, 2021 •

edited

Loading

webknjaz commented Jun 19, 2021

Wolfhound905 commented Sep 28, 2021

asvetlov commented Oct 30, 2021

slavkoja commented Oct 30, 2021

asvetlov commented Oct 30, 2021

Dreamsorcerer commented Aug 8, 2022 •

edited

Loading

pallas commented Aug 8, 2022

Dreamsorcerer commented Aug 15, 2022

EnriqueOrtiz27 commented Sep 5, 2022

Dreamsorcerer commented Sep 5, 2022

Dreamsorcerer commented Jul 22, 2023

Dreamsorcerer commented Aug 6, 2023

invalid character in header #5269

invalid character in header #5269

Comments

zhaoxuan commented Nov 22, 2020 • edited Loading

webknjaz commented Nov 22, 2020

webknjaz commented Nov 22, 2020

asvetlov commented Nov 22, 2020

zhaoxuan commented Nov 23, 2020

webknjaz commented Nov 23, 2020

derlih commented Nov 24, 2020

asvetlov commented Nov 24, 2020

zhaoxuan commented Nov 24, 2020

zhaoxuan commented Dec 8, 2020

webknjaz commented Mar 19, 2021

slavkoja commented Mar 21, 2021

This comment has been minimized.

This comment has been minimized.

kolamor commented May 30, 2021 • edited Loading

ppmdo commented May 31, 2021

yotamgod commented May 31, 2021

ppmdo commented May 31, 2021 • edited Loading

g66-gopanear commented Jun 14, 2021 • edited Loading

webknjaz commented Jun 19, 2021

Wolfhound905 commented Sep 28, 2021

asvetlov commented Oct 30, 2021

slavkoja commented Oct 30, 2021

asvetlov commented Oct 30, 2021

Dreamsorcerer commented Aug 8, 2022 • edited Loading

pallas commented Aug 8, 2022

Dreamsorcerer commented Aug 15, 2022

EnriqueOrtiz27 commented Sep 5, 2022

Dreamsorcerer commented Sep 5, 2022

Dreamsorcerer commented Jul 22, 2023

Dreamsorcerer commented Aug 6, 2023

zhaoxuan commented Nov 22, 2020 •

edited

Loading

kolamor commented May 30, 2021 •

edited

Loading

ppmdo commented May 31, 2021 •

edited

Loading

g66-gopanear commented Jun 14, 2021 •

edited

Loading

Dreamsorcerer commented Aug 8, 2022 •

edited

Loading