-
-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add more type checking in params/form-data #3170
Comments
there is also another issue, query string should support bytes natively. for example: import httpx
import urllib.parse
r = httpx.get(
"https://httpbin.org/get",
params={"q": bytes.fromhex("E1EE0E2734986F5419BB6CB6252BD9377183440E")},
)
print(urllib.parse.quote(bytes.fromhex("E1EE0E2734986F5419BB6CB6252BD9377183440E")))
print(r.text) expected url should be |
I'm not convinced by that. Why? |
because percent encoding can take native bytes as input |
when we say it support "string", it's actually assuming percent encoded bytes is utf8 content. consider this: from urllib.parse import quote, quote_from_bytes
utf8_s = "你好"
assert quote(utf8_s) == quote_from_bytes(utf8_s.encode())
that's also what |
I think this is confusing a couple of separate issues...
I'm think there's a good case that we shouldn't str-coerce types outside of the expected range. I don't currently think there's a good case that we should support bytes. |
I would vote for both... consider this: httpx.get("https://example.com/?q=%E1%EE%0E%274%98oT%19%BBl%B6%25%2B%D97q%83D%0E", params={"a": 1}) looks fine, right? but actually request url is correct value of q is also send bytes as query value is supported by requests, which is handled correctly |
Just to chip in a bit here: I have been using httpx to interact with a legacy system and as test client to mimic that legacy system calling back into our starlette based API. In order to get the correct values through httpx and not But I also realize that my case is not common. |
httpx.get("https://example.com/?q=%E1%EE%0E%274%98oT%19%BBl%B6%25%2B%D97q%83D%0E", params={"a": 1}) This is valid, tho simplify it... >>> r = httpx.get('https://www.example.com?q=%EE', params={'a': '1'})
>>> r.request.url
URL('https://www.example.com?q=%EF%BF%BD&a=1') Simplify it more... >>> httpx.QueryParams('q=%EE')
QueryParams('q=%EF%BF%BD') Eh? Root cause is... >>> from urllib.parse import unquote
>>> from urllib.parse import quote
>>> quote(unquote('%EE')) # stdlib behaving similarly
'%EF%BF%BD'
>>> unquote('%EE') # The hex code here isn't a valid UTF-8 codepoint, and is being replaced.
'�'
>>> unquote('%EE', errors='strict') # We'd raise an error if we decoded it with 'strict'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/tom.christie/.pyenv/versions/3.10.6/lib/python3.10/urllib/parse.py", line 667, in unquote
append(unquote_to_bytes(bits[i]).decode(encoding, errors))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 0: unexpected end of data |
yes, that's a problem, I think. if |
@cknv Could you give a (simplified) example please? |
Sure @tomchristie! Get ready for some flashback to the dark ages, before UTF8 won the encoding wars. from urllib.parse import urlencode
import httpx
client = httpx.Client()
values = {
"expected-as-utf8": "Hello world",
"expected-as-int": 1234,
"expected-as-latin1": "Hello world".encode("latin1"),
"expected-as-binary": b"\x01\x02\x03\x04",
}
encoded_query = urlencode(values)
# replaced .get with .build_request
# as making requests does not seem important for demonstrating my workaround
correctly_encoded_request = client.build_request(
method="GET",
url=f"http://localhost?{encoded_query}",
)
incorrecty_encoded_request = client.build_request(
method="GET",
url="http://localhost",
params=values,
)
# compare and contrast:
print(correctly_encoded_request.url.params)
print(incorrecty_encoded_request.url.params) When actually rereading the code, it wasn't UTF-16 that was used, but instead actually latin1, so I replaced that in the example, although either of them would need to be encoded into bytes. Narrowing our focus to just the plain byte value, it should be The latin1 value works without the workaround as long as you stick to the subset that is identical with ASCII, but we do at times get data through that isn't just that subset. To add some context the server receiving the request is written in C, and has a very lax attitude to encodings because it's all just bytes anyway. If httpx would percent encode bytes, I would probably be able to remove that workaround. However I would not be surprised if there are other ways to send bytes over http either established in an RFC or just by tradition, if so I would not know which one is the most appropriate, but it appears like percent encoding is at least in the RFC. Finally, I would like to reiterate that I know that this is way outside the mainstream, I have mostly accepted my workaround, but I at least wanted to raise my hand when someone else expressed a similar problem. |
+1 for using |
any news on this? |
Currently I'd suggest we sharpen up on this and raise type errors on non-string values... #3176 (comment) HTML form encoding only supports strings, let's help our users be explicit here. |
This is limited by html/browser, not HTTP protocol. httpx is a HTTP client for Python, not a browser. We should support more low level operations. For example, we won't do CORS check like browser, right? Also this is not entirely true. HTML form is post method with And in non-utf8 worlds, browsers allow you send non-utf8 string as form value. but we don't have gbk string in python, so we need to use bytes. And browsers also have https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams#preserving_plus_signs this not only a question should bytes be a supported type, it's a question should we support non-utf8 encoded string Which I think the answer should be YES. |
After seconds thoughts, I think HTML have different definitions of string as we have in Python worlds. In HTML, string can be any encoding, but it's false in python. |
I'm going to close this off in favor of the broader issue #3176. |
Problem
we should check input value types. Current behavoir which simply convert input value to
str(v)
is not very ideal.for example:
will get
{'content': "b'O_\\x13\\xd1m\\xa9\\xf1\\r\\x0193g:\\x1e\\xf4\\xe7\\xb9t\\xdb\\xe3'"}
Proposal
add type checking and more coerce.
for example:
form
int
/float
to str withstr(v)
bool
(Represent as JSON-style "true"/"false")None
(Represent as empty string)Enum
(Use .value and then as above.)TypeError
for anything else.query
except about, query should support
bytes
with percent encoding and do not throw.multipart_data
not sure how to handle this....
This will cause breaking changes, so we should consider do a minor version bump.
related: #1500 #1539 #1608
httpx/httpx/_content.py
Lines 136 to 149 in 4b85e6c
httpx/httpx/_utils.py
Lines 56 to 68 in 4b85e6c
The text was updated successfully, but these errors were encountered: