Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/http desync #2181

Merged
merged 5 commits into from
Nov 20, 2019
Merged

Fix/http desync #2181

merged 5 commits into from
Nov 20, 2019

Conversation

Sytten
Copy link
Contributor

@Sytten Sytten commented Nov 19, 2019

This is a first try at fixing #2176
I have looked at multiple server implementation and frankly I have seen a lot of different things especially concerning the TE parsing.
In summary the fixes are:

  • Don't rstrip the names of headers
    • Go server does that
    • RFC 7230 says No whitespace is allowed between the header field-name and colon
  • Only accept one Content-Length
    • Node does that
    • RFC 7230 says duplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field-values with a single valid Content-Length
    • It's easier to just deny that case than try to handle it and most likely induce a vulnerability
  • Only accept identity and chunked Transport-Encoding
    • In this implementation, the order does not matter (it probably should). The Go implementation only uses the first value of the header.
    • Seems to be in sync with the behaviour of AWS ALB
    • All other valid (gzip, compress, etc.) and invalid TE will return a 501, since we don't have readers for them I figured this was the right move, but feel free to correct me

Future improvements:

  • Handle the duplicates of headers earlier in the parsing
  • Use a Dict[str, List[str]] instead of a List[Tuple[str, str]] for headers
  • Drop the chunked TE for WSGI to app, this is what Waitress decided to do here and I think it's a good idea since we already buffer the whole thing in the server anyway

@Sytten
Copy link
Contributor Author

Sytten commented Nov 19, 2019

@benoitc all yours!

content_length = value
elif name == "TRANSFER-ENCODING":
chunked = value.lower() == "chunked"
normalized_value = value.lower()
if normalized_value == "identity":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gunicorn is a gateway so it shouldn't have a strong opinion about the transfer encoding. I think it's better to only test if the transfer encoding header is chuned or not.

if value.lower() == "chunked":
  chunked = True

and let it pass for other cases

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the need of UnsupportedTransferEncoding also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal can still be attacked by the following desync:

Content-Length: 5
Transfer-Encoding: chunked
Transfer-Encoding: Identity

Most proxy will use the chunked, but we will use the CL.
My understanding of Transfer-Encoding is that its a hop-by-hop header, meaning that we should parse it, use it and strip it before passing the data down the line.
For example, AWS ALB will not accept anything that is not identity or chunked. If you read my improvements section above, I suggest that we do something similar to other WSGI server: parse the request to the best of our ablity and only transfer data to the application using a CL. We do buffer the data in this application anyway and we don't support gzip, compress and deflate.

Copy link
Contributor Author

@Sytten Sytten Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading a bit more about PEP333. It says: WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable. There is a great thread here to read: pallets/flask#367

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gunicorn has a long history of supporting chunked encoding and last 20 is having the support of wsgi.input_terminated, i'm not inclined to change this actually.

if I understand your concern, in the case of your example

Content-Length: 5
Transfer-Encoding: chunked
Transfer-Encoding: Identity

TE should be Identity and not chunked? Or do we want it to be chunked (which is really weird).

My main concern is to make sure we support the last TE given whatever it is (there are other cases than "Identity" in some apis), so we must not return an error, if it's different.

gunicorn/http/message.py Show resolved Hide resolved
@benoitc
Copy link
Owner

benoitc commented Nov 20, 2019

Thanks for the patch! Seems to handle the case. Can you do th erequested changes? It would be good to make a release today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants