Query string ordering #354

ErwinJunge · 2018-10-30T19:17:08Z

This fixes the test from #353 by always comparing ordered query strings

codecov-io · 2018-10-30T19:36:53Z

Codecov Report

Merging #354 into master will increase coverage by 0.13%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
+ Coverage   94.59%   94.73%   +0.13%     
==========================================
  Files           7        7              
  Lines         832      835       +3     
==========================================
+ Hits          787      791       +4     
+ Misses         45       44       -1

Impacted Files	Coverage Δ
httpretty/compat.py	`100% <ø> (ø)`	⬆️
httpretty/core.py	`94.34% <100%> (+0.29%)`	⬆️
httpretty/utils.py	`85.71% <0%> (-7.15%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd0964c...3819645. Read the comment docs.

davetrollope · 2018-11-07T02:35:13Z

Sweet, thanks @ErwinJunge !

milosivanovic · 2020-10-30T05:24:06Z

Unfortunately, this PR has caused a regression.

In terms of the HTTP spec, the ordering of query string fields can be ignored, yes. However, consider this case:

Suppose we have a dynamic field value that we want to use a regular expression for to match within httpretty:

httpretty.register_uri(httpretty.GET, r"https://mysite/mypath\?q3=v3&q2=(\w+)&q1=v1", match_querystring=True)

The compiled regular expression is expecting a URL in that format with the query string fields in that order.

This now fails to match because URIInfo reorders the query string fields alphabetically

HTTPretty/httpretty/core.py

Line 1007 in dbe2c3a

query_items = sorted(parse_qs(query).items())

here, and thus the URL returned

HTTPretty/httpretty/core.py

Line 1137 in dbe2c3a

return self.regex.search(info.full_url(

here testing against the regular expression is changed to q1=v1&q2=my_dynamic_value&q3=v3 when the application code making the request is actually sending q3=v3&q2=my_dynamic_value&q1=v1.

As a workaround, the regular expression could be rewritten to match the internally-modified alphabetical version, however this is not an ideal solution as it's an implementation detail. If query string fields in the uri parameter of the httpretty_register.uri() method are to be treated as order-independent, then they should be provided in a separate params kwarg as a dict, for example:

httpretty.register_uri(
    httpretty.GET,
    r"https://mysite/mypath/.*/withregex",
    params={
        'q1': 'v1',
        'q2': r".*",
        'q3': 'v3'
    }
)

Notice that in this hypothetical solution, we can provide the regular expressions as individual values of the dict keys. The presence of the params kwarg would override match_querystring to True. HTTPretty would then match against the URI and params dict.

Alternatively, perhaps the ideal solution with the least change to the interface would be to add another kwarg such as ignore_param_order=True which enables the order-independent functionality, or better yet, just have that be an internal flag automatically set to True if a capturing group is detected in the query string section.

Otherwise, what's happening right now is misleading.

ErwinJunge · 2020-10-30T14:15:11Z

Hi @milosivanovic

I don't agree that this would count as a regression.

In your own description you're already saying that what you're trying to depend on doesn't conform the the http spec. I'd argue a logical conclusion of would be that the way you're using a regex to match query params also isn't conformant to the http spec and therefore that's where the problem is.

However, if you can come up with a way to modify the library to support your usecase without breaking existing tests I'm sure a PR would be welcomed.

milosivanovic · 2020-10-30T17:51:19Z

@ErwinJunge Sure, expecting the query string fields to be in the exact order specified in the test can be argued to be non-conformant, but it is a regression by definition. A regression is something that worked before and now doesn't.

Like I mentioned in my previous comment, if the order is to be ignored, the query param fields should be passed as a dict in another kwarg for optimal parsing. This would be a breaking change, and unfortunately that is necessary here if being conformant is the top concern. It could be rolled out in phases to limit the impact, for example implementing the params dict kwarg solution and then raising a DeprecationWarning if somebody passes in query string fields.

Here are some examples from Webmock, the library that inspired httpretty:

Source: https://github.com/bblimke/webmock

Special handling of query string params is exactly how they have tackled this, supporting both a custom DSL in the URI field and a params hash.

milosivanovic · 2020-10-30T18:04:34Z

Actually, it looks like httpretty was inspired by FakeWeb rather than WebMock.

FakeWeb has the exact same issue since its inception: chrisk/fakeweb#7 (comment) and the issue is still open after 11 years.

ErwinJunge · 2020-10-31T08:23:53Z

@ErwinJunge Sure, expecting the query string fields to be in the exact order specified in the test can be argued to be non-conformant, but it is a regression by definition. A regression is something that worked before and now doesn't.

Just because it something used to work and now doesn't doesn't make it a regression. It's only a regression if it was supposed to work before. In this case, relying on the ordering of query params wasn't supposed to work in the first place. The fact that it did just means you got lucky for a while. As an extreme example of this, I'd like to refer you to https://xkcd.com/1172/

Either way, I think your suggestion of passing query params in a dict should be possible to implement in a non-breaking way. If you make the params kwarg optional it should be possible to implement the parsing/matching in such a way that a dict of params works the way you described without breaking the existing functionality.

milosivanovic · 2020-11-01T03:26:32Z

@ErwinJunge That definition is more accurate, and makes the regression even easier to point out:

There are two uses cases for the uri argument in httpretty.register_uri: it supports either a string or regular expression. Both strings and regular expression input URIs can contain query string fields. Query string matching was introduced here: #98 and regular expressions were requested here: #26 and added shortly thereafter in f616519.

Suppose we have the following statements:

httpretty.register_uri(httpretty.GET, "http://example.com/?z=c&b=a", match_querystring=True)

and

httpretty.register_uri(httpretty.GET, r"http://example.com/?z=(\w+)&b=a", match_querystring=True)

Notice the second is not a string but a regular expression argument as denoted by the r prefix.

Prior to this PR, in both of the above invocations of register_uri, the query string was not parsed and reordered, it was simply compared verbatim as seen in the diff here https://github.com/gabrielfalcao/HTTPretty/pull/354/files#diff-326a996e75c88c9fc8dabbfdca5029c2a409fac6a2dd9b7dffa9f3726bce7c55L885.

If the application for which the test case is built for decided to instead send http://example.com/?b=a&z=c which has the fields in reverse order, neither of the matchers would see this as the same URL and so tests would faile as evident in #353, which is a bug. Your PR fixed this bug by making the query string field ordering agnostic which makes sense given the HTTP spec. Now instead of comparing verbatim, the query string fields are first parsed and then reordered alphabetically. The parsing and reordering happens to both the URIInfo object created when registering the URI:

HTTPretty/httpretty/core.py

Line 1124 in dbe2c3a

self.info = URIInfo.from_uri(uri, entries)

and to the URIInfo object created from the application's incoming request:

HTTPretty/httpretty/core.py

Line 632 in dbe2c3a

info = URIInfo(

The two can then be trivially compared.

However, this solution handled only one of the two supported use cases: string inputs. If a regular expression is passed as a uri argument, a completely different code path is taken:

HTTPretty/httpretty/core.py

Line 1116 in dbe2c3a

if is_regex:

which compares the verbatim regular expression provided by the user writing the test against the now-ordered incoming request from the application code.

If URIs are now being parsed and their query string fields reordered, then this needs to happen on both string and regular expression user inputs, otherwise this change would result in a regression, which is exactly what has happened here.

If it is possible to reliably parse query strings that contain complicated regular expressions (like http://example.com/path?field=(?P<n>\d+)&k=v, then the same solution can also be applied to the regular expression input (reorder the fields containing the raw regular expression tokens), but I will have to think about it more.

I do think that the optional kwarg solution is best.

test for query string ordering

669d0ef

ErwinJunge force-pushed the query_string_ordering branch from 5af46af to 3ae809f Compare October 30, 2018 19:36

Make URIMatcher indifferent to query string ordering

3819645

ErwinJunge force-pushed the query_string_ordering branch from 3ae809f to 3819645 Compare October 30, 2018 20:14

ErwinJunge mentioned this pull request Oct 30, 2018

test for query string ordering WIP #353

Merged

gabrielfalcao merged commit 89754a9 into gabrielfalcao:master Nov 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query string ordering #354

Query string ordering #354

ErwinJunge commented Oct 30, 2018

codecov-io commented Oct 30, 2018 •

edited

Loading

davetrollope commented Nov 7, 2018

milosivanovic commented Oct 30, 2020 •

edited

Loading

ErwinJunge commented Oct 30, 2020

milosivanovic commented Oct 30, 2020

milosivanovic commented Oct 30, 2020

ErwinJunge commented Oct 31, 2020

milosivanovic commented Nov 1, 2020 •

edited

Loading

Query string ordering #354

Query string ordering #354

Conversation

ErwinJunge commented Oct 30, 2018

codecov-io commented Oct 30, 2018 • edited Loading

Codecov Report

davetrollope commented Nov 7, 2018

milosivanovic commented Oct 30, 2020 • edited Loading

ErwinJunge commented Oct 30, 2020

milosivanovic commented Oct 30, 2020

milosivanovic commented Oct 30, 2020

ErwinJunge commented Oct 31, 2020

milosivanovic commented Nov 1, 2020 • edited Loading

codecov-io commented Oct 30, 2018 •

edited

Loading

milosivanovic commented Oct 30, 2020 •

edited

Loading

milosivanovic commented Nov 1, 2020 •

edited

Loading