Content-length validation does not handle spaces #3321

chrisstaite-menlo · 2023-09-13T10:02:58Z

Additional validation of Content-Length parsing was introduced in bf90f3a however, the value is not striped and therefore a value of '0 ' causes a ValueError.

This is a particular issue because if Content-Length is the last header in a request and parse_line is being called then the \r\n of the end of the header is interpreted as a multi-line continuation and appends the space to the end in HTTPHeaders.parse_line: new_part = " " + line.lstrip().

@bdarnell

The text was updated successfully, but these errors were encountered:

bdarnell · 2023-09-24T18:27:01Z

Can you say more about how exactly this happens? It's true that we don't strip the value when parsing content-length, but it's supposed to already be stripped in the last line of HTTPHeaders.parse_line.

The \r\n is not supposed to make it to parse_line; those characters are handled in parse(). I don't see an issue when Content-Length is the last header: we have a test for this case at

tornado/tornado/httputil.py

Line 188 in a48d634

    
                   >>> h = HTTPHeaders.parse("Content-Type: text/html\\r\\nContent-Length: 42\\r\\n")

.

I do see a couple of potential issues in edge cases, though.

Content-Length: 42\r\n \r\n (with a space between the CRLF pairs) will add a space to the value "42 "
Content-Length:\r\n 42\r\n (with the whole value in a continuation line) adds a leading space, " 42"

Both of these cases are errors now although they were accepted prior to bf90f3a. I think they're both technically legal although I'd have to go back to the RFCs to be sure.

chrisstaite-menlo · 2023-09-27T08:09:53Z

We had some code that was manually proxying headers from an upstream request to a response that was pushing all of the lines passed to a AsyncHTTPClient.fetch header_callback to parse_line that triggered this.

kenballus · 2024-01-31T09:11:34Z

I just tested sending a request with a Content-Length of 0 , and it worked totally fine. Can you enter an example of a request that causes the problem?

chrisstaite-menlo · 2024-01-31T09:17:48Z

The Content-Length needs to be the last header which then gets interpreted as a multi-line continuation and then adds a space itself, as stated in the first message.

kenballus · 2024-01-31T16:45:41Z

Got it; now I can reproduce the bug. Agreed that this is a problem.

Also, it turns out that gunicorn and fasthttp also have this exact same bug.

bdarnell · 2024-03-03T16:01:47Z

Got it; now I can reproduce the bug. Agreed that this is a problem.

I'm still not clear on what exactly the problem is. Is there an issue with HTTPHeaders.parse() or only with parse_line()? Internally, Tornado only uses parse_line() inside parse() and in curl_httpclient's header callback.

I see that there's a design mismatch in the interfaces of header_callback and parse_line: the former gives you the newlines, while parse_line expects them to be removed (this isn't formally specified but it's implied by the doctest). So you can't actually pass the values from header_callback directly to parse_line, even though this is superficially a reasonable thing to do.

There's also a couple of weird edge cases I noted at the bottom of #3321 (comment)

Does that cover everything or am I missing something?

Solutions to the design mismatch include:

Working as intended, just needs better docs
Deprecate header_callback in AsyncHTTPClient.fetch and replace it with a separate callback that gives you a pre-parsed HTTPHeaders object. We need a callback that gives you headers before the first streaming chunk, but doing it with raw header lines just pushes unnecessary work into the application.
Make parse_line able to handle newlines. This almost works (by accident) because simple headers get stripped, but continuation lines can cause extraneous whitespace.

bdarnell · 2024-06-05T14:37:21Z

Aha, now I see the problem. Single-line headers have leading and trailing whitespace stripped, while continuation lines make it possible to construct a header with trailing whitespace, potentially confusing users of that header. RFC 9110 is clear that trailing whitespace should be stripped from header values. I'm going to:

Make continuation lines containing only whitespace an error. The parse_line interface doesn't let us handle this properly (we must preserve internal space but strip trailing space, and we can't tell in the line-by-line interface whether we're looking at a middle line or the last one of a header)
Handle newlines in parse_line, specifically so that lines containing only newlines are no-ops. This fixes the way that the last header gets a trailing space if you use parse_line directly instead of parse
Emit a deprecation warning on continuation lines. There should be no reason to support this feature any more and we should get rid of it in the future.

bdarnell · 2024-12-07T22:05:34Z

Make parse_line able to handle newlines. This almost works (by accident) because simple headers get stripped, but continuation lines can cause extraneous whitespace.

When I wrote this, you could pass headers from header_callback to parse_line and it would work in most cases (except involving continuation lines). But in #3387 I broke this accidental feature, and now (starting with Tornado 6.4.1) combining header_callback with parse_line adds trailing whitespace to every header unless you manually strip the trailing CRLF. So I think we need to fully handle trailing CRLF in parse_line and not just lines that are whitespace-only.

bdarnell added the http1connection label Jun 13, 2024

Nirab123456 mentioned this issue Oct 20, 2024

Enhanced Trailing Whitespace Handling in HTTP Headers #3429

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content-length validation does not handle spaces #3321

Content-length validation does not handle spaces #3321

chrisstaite-menlo commented Sep 13, 2023 •

edited

Loading

bdarnell commented Sep 24, 2023

chrisstaite-menlo commented Sep 27, 2023

kenballus commented Jan 31, 2024

chrisstaite-menlo commented Jan 31, 2024

kenballus commented Jan 31, 2024 •

edited

Loading

bdarnell commented Mar 3, 2024

bdarnell commented Jun 5, 2024

bdarnell commented Dec 7, 2024

Content-length validation does not handle spaces #3321

Content-length validation does not handle spaces #3321

Comments

chrisstaite-menlo commented Sep 13, 2023 • edited Loading

bdarnell commented Sep 24, 2023

chrisstaite-menlo commented Sep 27, 2023

kenballus commented Jan 31, 2024

chrisstaite-menlo commented Jan 31, 2024

kenballus commented Jan 31, 2024 • edited Loading

bdarnell commented Mar 3, 2024

bdarnell commented Jun 5, 2024

bdarnell commented Dec 7, 2024

chrisstaite-menlo commented Sep 13, 2023 •

edited

Loading

kenballus commented Jan 31, 2024 •

edited

Loading