Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form-encoded data with None as the value gets passed as string "None" #1500

Closed
2 tasks done
pletnes opened this issue Mar 8, 2021 · 9 comments · Fixed by #1539
Closed
2 tasks done

Form-encoded data with None as the value gets passed as string "None" #1500

pletnes opened this issue Mar 8, 2021 · 9 comments · Fixed by #1539
Labels
requests-compat Issues related to Requests backwards compatibility user-experience Ensuring that users have a good experience using the library

Comments

@pletnes
Copy link

pletnes commented Mar 8, 2021

Checklist

  • There are no similar issues or pull requests to fix it yet.
  • The bug is reproducible against the latest release and/or master.

Describe the bug

When posting form data using requests and the form value is None, nothing is passed. With httpx, the None is converted to the string "None".

To reproduce

import requests
import httpx

url = 'https://httpbin.org/post'
data = dict(foo=None)

resp_requests = requests.post(url, data=data)
pprint(resp_requests.json())
# In the response:  'form': {},

resp_httpx = httpx.post(url, data=data)
pprint(resp_httpx.json())
# In the response:  'form': {'foo': 'None'},

Expected behavior

  1. Expected behavior to match requests, since "broadly" the behavior is expected to match
  2. Did not expect that "None" would be passed, rather e.g. an empty string might make sense?

Actual behavior

The passed form-encoded data contains the string "None"

Debugging material

None, but see code example

Environment

Ubuntu 20.04 docker image

Additional context

Writing a PoC to replace requests with httpx right here.
https://github.com/SAP/cloud-pysec/blob/6e3f92fefb2e0dc4f3c99aa7ed5dc2d70e6734ba/sap/xssec/security_context.py#L511
The method request_token_for_client passes None as a default for "all scopes".

@pletnes
Copy link
Author

pletnes commented Mar 8, 2021

Do you consider this a bug, or just "behaves differently"?

@florimondmanca
Copy link
Member

florimondmanca commented Mar 8, 2021

Hi :)

For some reason this rings a bell. Are we sure there are no previous issues about this?

Otherwise yeah I think the "None" string is a bit surprising. I would advise validating this isn't specifically linked to HTTPBin, eg inspect what the actual request body looks like.

@florimondmanca florimondmanca added the user-experience Ensuring that users have a good experience using the library label Mar 8, 2021
@florimondmanca
Copy link
Member

Not marking as a bug yet because we'd also need to validate if "null values" make sense in the context of HTTP forms at all.

If not, consider matching Requests by ignoring None fields?

Or raising a loud error as "this is ambiguous"?

@florimondmanca florimondmanca added the requests-compat Issues related to Requests backwards compatibility label Mar 8, 2021
@pletnes
Copy link
Author

pletnes commented Mar 8, 2021

From what I've seen, passing empty string instead of None works fine both for requests and httpx, and round-trips with httpbin. This is a perfectly fine workaround for me. However, for compatibility with requests, converting None to empty string will not yield identical results, since then the empty string is in fact passed with the POST request.

I've verified that the empty-string workaround works for the project I am currently working on, so it is not only a httpbin issue. I used httpbin to create a reproducible minimal example.

Personally, I would be least surprised if httpx followed requests, since this is what the documentation promises. A loud error would be surprising but would not lead to silent failure, so in my view that would be acceptable.

@pletnes pletnes closed this as completed Mar 8, 2021
@pletnes pletnes reopened this Mar 8, 2021
@pletnes
Copy link
Author

pletnes commented Mar 8, 2021

Relevant discussion for request, a node.js package, which argues that None, null etc are language-specific and do not make sense.
request/request#1832

Scrapy appears to do the same as requests:

If a value passed in this parameter is None, the field will not be included
in the request, even if it was present in the response <form> element.

https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.FormRequest.from_response

The aiohttp package does the same as httpx:

async with aiohttp.request('POST', url, data=data) as resp_aiohttp:
    pprint(await resp_aiohttp.json())

yields:

...
 'form': {'foo': 'None'},
...

@florimondmanca
Copy link
Member

florimondmanca commented Mar 8, 2021

@pletnes Thanks, those are very useful pointers. :-)

As to why this happens:

When data=<dict> is passed, we send it as URL-encoded form data, built through this call:

httpx/httpx/_content.py

Lines 106 to 113 in f3c2941

def encode_urlencoded_data(
data: dict,
) -> Tuple[Dict[str, str], ByteStream]:
body = urlencode(data, doseq=True).encode("utf-8")
content_length = str(len(body))
content_type = "application/x-www-form-urlencoded"
headers = {"Content-Length": content_length, "Content-Type": content_type}
return headers, PlainByteStream(body)

So our behavior essentially boils down to the stdlib doing this:

>>> from urllib.parse import urlencode
>>> urlencode({"foo": None, "bar": "baz"})
'foo=None&bar=baz'

I think that's somewhat undefined behavior, since the docs read:

Convert a mapping object or a sequence of two-element tuples, which may contain str or bytes objects, to a percent-encoded ASCII text string.

urlencode() officially only accepts str or bytes, but actually it accepts anything else, in which cases it just calls str() on it:

>>> class Spy:
...     def __str__(self):
...             print('Gotcha')
...             return 'Spy()'
... 
>>> urlencode({'foo': Spy()})
Gotcha
'foo=Spy%28%29'

From this, it means that any non-str/bytes value will yield unexpected results, such as passing a custom object (like above), passing True (foo=True), etc.

Actually the None case has been reported on bpo in 2013: https://bugs.python.org/issue18857 still discussed actively in 2020. Looks like the motivation/discussion there was to allow no-value querystring items, such as ?foo, as opposed to ?foo= which would happen if None was treated as ''.

Anyway, I sense head-scratching ahead, so I'd be tempted to keep things as-is for simplicity. (Plus, I believe issues around special-casing values like None have been a PITA for Requests over time.) We could just add a note in our Requests Compatibility guide about this. WDYT?

@florimondmanca
Copy link
Member

florimondmanca commented Mar 8, 2021

Other relevant threads:

#175 resolved #154 by treating params={"foo": None} as ?foo=. Which made me realize we have this small utility function:

httpx/httpx/_utils.py

Lines 59 to 71 in c725387

def str_query_param(value: "PrimitiveData") -> str:
"""
Coerce a primitive data type into a string value for query params.
Note that we prefer JSON-style 'true'/'false' for boolean values here.
"""
if value is True:
return "true"
elif value is False:
return "false"
elif value is None:
return ""
return str(value)

Which we use in QueryParams here:

self._dict = {str(k): str_query_param(v) for k, v in items}

We might consider applying the same transformation in encode_urlencoded_data()…? That would still be different from Requests (and we'd have to document that — we already for for query params: https://www.python-httpx.org/compatibility/#query-parameters), but at least it would be consistent with what we already do for query params.

@pletnes
Copy link
Author

pletnes commented Mar 9, 2021

I see several sensible options - clearly the "None" string representation is the only one which is obviously not so great. I'd cast a vote for consistency, since being consistent makes life much easier for end users, even if a choice is not "locally optimal" it makes reasoning about "what would httpx do" easier.

@tomchristie
Copy link
Member

Personally I think requests' behaviour here is unintuitive.

We might consider applying the same transformation in encode_urlencoded_data()…? That would still be different from Requests (and we'd have to document that — we already for for query params: https://www.python-httpx.org/compatibility/#query-parameters), but at least it would be consistent with what we already do for query params.

Yes that'd be my strong preference too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requests-compat Issues related to Requests backwards compatibility user-experience Ensuring that users have a good experience using the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants