Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch-networking: fix parsing of HTTP headers #1182

Merged
merged 7 commits into from
Nov 17, 2024
Merged

Conversation

SuperMaxusa
Copy link
Contributor

Currently, I have only seen this bug in older versions of Dillo, but it's possible that other HTTP clients have it too.

Test-case: boot from https://distro.ibiblio.org/damnsmall/current/current.iso with enabled fetch networking and then try to open any page in Dillo, you get in the browser console:

TypeError: Headers.set: accept-encoding:gzip is an invalid header name.

dillo-bug

A raw HTTP request from Dillo looks like this:

GET / HTTP/1.0
Host: copy.sh
User-Agent: Dillo/0.8.5-i18n-misc
Accept-Language: en-us, ja
Accept-Encoding:gzip
Accept-Charset:utf-8,ISO8859-1
Cookie2: $Version="1"

According to RFC 2616 (section 4.2), HTTP client/server can use any amount of whitespaces after separator, previous code expected only one space after the colon (separator).

src/browser/fetch_network.js Outdated Show resolved Hide resolved
src/browser/fetch_network.js Outdated Show resolved Hide resolved
src/browser/fetch_network.js Outdated Show resolved Hide resolved
@SuperMaxusa SuperMaxusa marked this pull request as draft November 11, 2024 02:25
@SuperMaxusa
Copy link
Contributor Author

Thanks for the suggestions, I think that it's better make a properly HTTP headers validation, so my apologies that I had remake this PR. Let me know if I need to create a new PR instead.

Changes:

  • added check on empty header key/value and forbidden chars
  • send Error 400 on failed check
  • added tests for each behaviour

@SuperMaxusa SuperMaxusa changed the title fetch-networking: fix parsing of HTTP headers with different number of whitespaces fetch-networking: fix parsing of HTTP headers Nov 11, 2024
@chschnell
Copy link
Contributor

chschnell commented Nov 12, 2024

Another concern I see with this HTTP request parser: Only the first fragment of the request body is taken into account, that is as many bytes that fit into the last TCP headers packet right after the end of headers, up until the end of packet. If the originating client sends a request with a longer body (one that stretches across multiple TCP packets) this will break.

After (or while) reading the headers it should be checked whether there is a Content-Length field, and if there is then as many bytes should be read from the originating client before continuing to process the request. For the sake of completeness (the originating client should be free to do whatever is allowed), also chunked body encoding should be supported when reading this reuest body.

EDIT: I'm not saying anyone should add that now, I just noticed this the other day and wanted to politely bring it up. I'd also have a few ideas regarding this.

src/browser/fetch_network.js Outdated Show resolved Hide resolved
@SuperMaxusa
Copy link
Contributor Author

After (or while) reading the headers it should be checked whether there is a Content-Length field, and if there is then as many bytes should be read from the originating client before continuing to process the request.

Agreed, it might be worth adding a check if Content-Length: 0 but the client sends data.
If I correctly understand, HTTP/1.1 client also can send chunked data: https://everything.curl.dev/http/post/chunked.html, and I'm not sure that even without Transfer-Encoding: chunked can send any amount of data before the connection is closed, as in this case with the server response: #1178 (comment)

@chschnell
Copy link
Contributor

chschnell commented Nov 12, 2024

Let me rephrase what I mean, forgive me if this is already clear to you.

Think of a longer HTTP request from the originating client (by that I mean something like HTGET.EXE from mTCP). I mean "long" in relation to the TCP frame size.

For simplicitly let's say we have a TCP max. payload size of 1500 bytes, and a HTTP request of 500 bytes header data (including separator) and a body of another 2000 bytes, so a request of 2500 bytes total. The client's TCP stack splits this HTTP request into two TCP frames:

0      500    1000   1500   2000   2500
:      :      :      :      :      :
+------+---------------------------+
|Header|           Body            |
+------+---------------------------+
+--------------------+-------------+
|    TCP frame 1     | TCP frame 2 |
+--------------------+-------------+

on_data_http() gets called for each TCP frame received from the originating client. Its condition that checks if we have received all headers becomes true with the first TCP frame. When on_data_http() gets called again later with the second TCP frame, what part of the code handles that? Note that the fetch() transaction has already started at this point.

@SuperMaxusa
Copy link
Contributor Author

Thank you for your clarification, probably checking Content-Length is worth it, I will investigate that in my free time.

@copy
Copy link
Owner

copy commented Nov 17, 2024

@SuperMaxusa It's good to merge from my point of view, any objections?

@SuperMaxusa
Copy link
Contributor Author

It's good to merge from my point of view, any objections?

Ok, I have no objections.
@chschnell Do you have any suggestions? How about an idea to create an issue from your comment about HTTP request on multiple TCP frames?

@SuperMaxusa SuperMaxusa marked this pull request as ready for review November 17, 2024 19:50
@copy copy merged commit f767d4d into copy:master Nov 17, 2024
3 checks passed
@copy
Copy link
Owner

copy commented Nov 17, 2024

Merci!

@SuperMaxusa SuperMaxusa deleted the fetch-fix branch November 17, 2024 21:08
@chschnell
Copy link
Contributor

It's good to merge from my point of view, any objections?

Ok, I have no objections. @chschnell Do you have any suggestions? How about an idea to create an issue from your comment about HTTP request on multiple TCP frames?

It's fine, it can wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants