Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query string ordering #354

Merged

Conversation

ErwinJunge
Copy link
Contributor

This fixes the test from #353 by always comparing ordered query strings

@codecov-io
Copy link

codecov-io commented Oct 30, 2018

Codecov Report

Merging #354 into master will increase coverage by 0.13%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
+ Coverage   94.59%   94.73%   +0.13%     
==========================================
  Files           7        7              
  Lines         832      835       +3     
==========================================
+ Hits          787      791       +4     
+ Misses         45       44       -1
Impacted Files Coverage Δ
httpretty/compat.py 100% <ø> (ø) ⬆️
httpretty/core.py 94.34% <100%> (+0.29%) ⬆️
httpretty/utils.py 85.71% <0%> (-7.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd0964c...3819645. Read the comment docs.

@gabrielfalcao gabrielfalcao merged commit 89754a9 into gabrielfalcao:master Nov 4, 2018
@davetrollope
Copy link

Sweet, thanks @ErwinJunge !

@milosivanovic
Copy link

milosivanovic commented Oct 30, 2020

Unfortunately, this PR has caused a regression.

In terms of the HTTP spec, the ordering of query string fields can be ignored, yes. However, consider this case:

Suppose we have a dynamic field value that we want to use a regular expression for to match within httpretty:

httpretty.register_uri(httpretty.GET, r"https://mysite/mypath\?q3=v3&q2=(\w+)&q1=v1", match_querystring=True)

The compiled regular expression is expecting a URL in that format with the query string fields in that order.

This now fails to match because URIInfo reorders the query string fields alphabetically

query_items = sorted(parse_qs(query).items())
here, and thus the URL returned
return self.regex.search(info.full_url(
here testing against the regular expression is changed to q1=v1&q2=my_dynamic_value&q3=v3 when the application code making the request is actually sending q3=v3&q2=my_dynamic_value&q1=v1.

As a workaround, the regular expression could be rewritten to match the internally-modified alphabetical version, however this is not an ideal solution as it's an implementation detail. If query string fields in the uri parameter of the httpretty_register.uri() method are to be treated as order-independent, then they should be provided in a separate params kwarg as a dict, for example:

httpretty.register_uri(
    httpretty.GET,
    r"https://mysite/mypath/.*/withregex",
    params={
        'q1': 'v1',
        'q2': r".*",
        'q3': 'v3'
    }
)

Notice that in this hypothetical solution, we can provide the regular expressions as individual values of the dict keys. The presence of the params kwarg would override match_querystring to True. HTTPretty would then match against the URI and params dict.

Alternatively, perhaps the ideal solution with the least change to the interface would be to add another kwarg such as ignore_param_order=True which enables the order-independent functionality, or better yet, just have that be an internal flag automatically set to True if a capturing group is detected in the query string section.

Otherwise, what's happening right now is misleading.

@ErwinJunge
Copy link
Contributor Author

Hi @milosivanovic

I don't agree that this would count as a regression.

In your own description you're already saying that what you're trying to depend on doesn't conform the the http spec. I'd argue a logical conclusion of would be that the way you're using a regex to match query params also isn't conformant to the http spec and therefore that's where the problem is.

However, if you can come up with a way to modify the library to support your usecase without breaking existing tests I'm sure a PR would be welcomed.

@milosivanovic
Copy link

@ErwinJunge Sure, expecting the query string fields to be in the exact order specified in the test can be argued to be non-conformant, but it is a regression by definition. A regression is something that worked before and now doesn't.

Like I mentioned in my previous comment, if the order is to be ignored, the query param fields should be passed as a dict in another kwarg for optimal parsing. This would be a breaking change, and unfortunately that is necessary here if being conformant is the top concern. It could be rolled out in phases to limit the impact, for example implementing the params dict kwarg solution and then raising a DeprecationWarning if somebody passes in query string fields.

Here are some examples from Webmock, the library that inspired httpretty:
Screen Shot 2020-10-30 at 10 30 05 AM
Source: https://github.com/bblimke/webmock

Special handling of query string params is exactly how they have tackled this, supporting both a custom DSL in the URI field and a params hash.

@milosivanovic
Copy link

Actually, it looks like httpretty was inspired by FakeWeb rather than WebMock.

FakeWeb has the exact same issue since its inception: chrisk/fakeweb#7 (comment) and the issue is still open after 11 years.

@ErwinJunge
Copy link
Contributor Author

@ErwinJunge Sure, expecting the query string fields to be in the exact order specified in the test can be argued to be non-conformant, but it is a regression by definition. A regression is something that worked before and now doesn't.

Just because it something used to work and now doesn't doesn't make it a regression. It's only a regression if it was supposed to work before. In this case, relying on the ordering of query params wasn't supposed to work in the first place. The fact that it did just means you got lucky for a while. As an extreme example of this, I'd like to refer you to https://xkcd.com/1172/

Either way, I think your suggestion of passing query params in a dict should be possible to implement in a non-breaking way. If you make the params kwarg optional it should be possible to implement the parsing/matching in such a way that a dict of params works the way you described without breaking the existing functionality.

@milosivanovic
Copy link

milosivanovic commented Nov 1, 2020

@ErwinJunge That definition is more accurate, and makes the regression even easier to point out:

There are two uses cases for the uri argument in httpretty.register_uri: it supports either a string or regular expression. Both strings and regular expression input URIs can contain query string fields. Query string matching was introduced here: #98 and regular expressions were requested here: #26 and added shortly thereafter in f616519.

Suppose we have the following statements:

httpretty.register_uri(httpretty.GET, "http://example.com/?z=c&b=a", match_querystring=True)

and

httpretty.register_uri(httpretty.GET, r"http://example.com/?z=(\w+)&b=a", match_querystring=True)

Notice the second is not a string but a regular expression argument as denoted by the r prefix.

Prior to this PR, in both of the above invocations of register_uri, the query string was not parsed and reordered, it was simply compared verbatim as seen in the diff here https://github.com/gabrielfalcao/HTTPretty/pull/354/files#diff-326a996e75c88c9fc8dabbfdca5029c2a409fac6a2dd9b7dffa9f3726bce7c55L885.

If the application for which the test case is built for decided to instead send http://example.com/?b=a&z=c which has the fields in reverse order, neither of the matchers would see this as the same URL and so tests would faile as evident in #353, which is a bug. Your PR fixed this bug by making the query string field ordering agnostic which makes sense given the HTTP spec. Now instead of comparing verbatim, the query string fields are first parsed and then reordered alphabetically. The parsing and reordering happens to both the URIInfo object created when registering the URI:

self.info = URIInfo.from_uri(uri, entries)
and to the URIInfo object created from the application's incoming request:
info = URIInfo(
The two can then be trivially compared.

However, this solution handled only one of the two supported use cases: string inputs. If a regular expression is passed as a uri argument, a completely different code path is taken:

if is_regex:
which compares the verbatim regular expression provided by the user writing the test against the now-ordered incoming request from the application code.

If URIs are now being parsed and their query string fields reordered, then this needs to happen on both string and regular expression user inputs, otherwise this change would result in a regression, which is exactly what has happened here.

If it is possible to reliably parse query strings that contain complicated regular expressions (like http://example.com/path?field=(?P<n>\d+)&k=v, then the same solution can also be applied to the regular expression input (reorder the fields containing the raw regular expression tokens), but I will have to think about it more.

I do think that the optional kwarg solution is best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants