Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASGI raw_path scope key should not include the query string portion. #2810

Closed
simonw opened this issue Aug 2, 2023 · 6 comments · Fixed by #2999
Closed

ASGI raw_path scope key should not include the query string portion. #2810

simonw opened this issue Aug 2, 2023 · 6 comments · Fixed by #2999

Comments

@simonw
Copy link
Contributor

simonw commented Aug 2, 2023

I ran into a bug where I had written code that assumed raw_path as provided by Uvicorn would include the query string, where it turns out not to.

My tests didn't catch this because they were exercising the code using HTTPX ASGI emulation, and it turns out HTTPX thinks that raw_path DOES include the query string.

In Uvicorn: https://github.com/encode/uvicorn/blob/93bb8d3879808ae376b57e3721cc227fce2c27c1/uvicorn/protocols/http/h11_impl.py#L207

                raw_path, _, query_string = event.target.partition(b"?")

But in HTTPX:

httpx/httpx/_urls.py

Lines 277 to 292 in 9415af6

@property
def raw_path(self) -> bytes:
"""
The complete URL path and query string as raw bytes.
Used as the target when constructing HTTP requests.
For example:
GET /users?search=some%20text HTTP/1.1
Host: www.example.org
Connection: close
"""
path = self._uri_reference.path or "/"
if self._uri_reference.query is not None:
path += "?" + self._uri_reference.query
return path.encode("ascii")

I'm pretty confident HTTPX is incorrect about this. The ASGI spec (coincidentally the one bit of it I contributed directly to) says: https://asgi.readthedocs.io/en/latest/specs/www.html#http-connection-scope

raw_path (byte string) -- The original HTTP path component unmodified from the bytes that were received by the web server.

On reading it now I realize this is a little ambiguous.

Daphne (the closest we have to a reference implementation of ASGI) backs up the idea that raw_path and path should be almost identical except for their encoding: https://github.com/django/daphne/blob/e49c39a4e5fac8ec170dd653641a9e90844fd3f1/daphne/ws_protocol.py#L77C1-L78

                    "path": unquote(self.path.decode("ascii")),
                    "raw_path": self.path,
@simonw
Copy link
Contributor Author

simonw commented Aug 2, 2023

It looks like url.raw_path in HTTPX has been designed as the bytes including the query string.

Which means the bug is here:

# ASGI scope.
scope = {
"type": "http",
"asgi": {"version": "3.0"},
"http_version": "1.1",
"method": request.method,
"headers": [(k.lower(), v) for (k, v) in request.headers.raw],
"scheme": request.url.scheme,
"path": request.url.path,
"raw_path": request.url.raw_path,
"query_string": request.url.query,
"server": (request.url.host, request.url.port),
"client": self.client,
"root_path": self.root_path,
}

That's where the raw_path ASGI scope is copied directly from request.url.raw_path - when actually it should only be the part of that up to the first ?.

@simonw simonw closed this as completed Aug 2, 2023
@simonw
Copy link
Contributor Author

simonw commented Aug 2, 2023

Accidentally closed this, reopening.

@andrewgodwin
Copy link

Hmm, I am not entirely sure how we should resolve this - the web server receives the query string bytes in the same part of the HTTP protocol as the path. If two servers agree, though, that's probably the best way.

@tomchristie
Copy link
Member

Okay so there's two different aspects here...

There's the ASGI scope key "raw_path" which we use for our ASGITransport, here...

# ASGI scope.
scope = {
"type": "http",
"asgi": {"version": "3.0"},
"http_version": "1.1",
"method": request.method,
"headers": [(k.lower(), v) for (k, v) in request.headers.raw],
"scheme": request.url.scheme,
"path": request.url.path,
"raw_path": request.url.raw_path,
"query_string": request.url.query,
"server": (request.url.host, request.url.port),
"client": self.client,
"root_path": self.root_path,
}

Which, appears to be incorrect.

Then there's the URL.raw_path property. The issue with the property isn't that it's incorrect. It's that it's poorly named. It probably ought to be URL.target. Which is a useful and as-good-as-we-can-manage-to-name-it property representing the (origin form of the) HTTP/1.1 request-target. Or in other words, the actual bytes-on-the-wire that get sent in the leading line of an outgoing HTTP/1.1 request.

GET /hello.txt?test HTTP/1.1
User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
Host: www.example.com
Accept-Language: en, mi

See also https://h11.readthedocs.io/en/latest/api.html#h11.Request

@tomchristie
Copy link
Member

tomchristie commented Aug 3, 2023

Let's treat this issue as just addressing the ASGI raw_path scope variable, and resolve that here...

"raw_path": request.url.raw_path,

(Incidentally I think if we'd been really sharp we might have clocked onto this back in django/asgiref#87 and gone with a target instead of raw_path there too. Better mapping to whats on the wire, and also allows application code to differentiate between /example and /example?)

@tomchristie tomchristie changed the title HTTPX idea of raw_path differs from Uvicorn - HTTPX includes the query string, Uvicorn does not ASGI raw_path scope key should not include the query string portion. Aug 3, 2023
@andrewgodwin
Copy link

Yeah, I agree it was poorly named - I think I leaned too close to WSGI on this one. Unfortunately, we now have what we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants