-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize route and path, encoded and decode dynamic path segments #91
normalize route and path, encoded and decode dynamic path segments #91
Conversation
This allows handlers like "/foo/:bar" to match a path with a url encoded segment such as "/foo/abc%Fdef" -> { params: { bar: "abc/def" } } See related issue emberjs/ember.js#11497
f594a76
to
391ec7b
Compare
Updated the description to clarify that although certain users with percent-encoded static routes may be affected, it is not a breaking change (all static routes that match in the current code will still match after this PR). Clarified that users who may find this a breaking change for dynamic route segments likely only need to remove their workaround. Also updated the terminology section for clarity and renamed the "normalization caveats" to "normalization or uri-reserved characters". |
Can we disallow special symbols in a route path? e.g. |
@mmun They have to be encoded to work, is that not enough? |
@krisselden The PR doesn't enforce that at the moment though: + // unencoded ":" in non-significant place
+ route: "/foo/b:ar",
+ matches: ["/foo/b:ar"],
+ nonmatches: ["/foo/b%3Aar", "/foo/b%3aar"]
+ /* FIXME should this work?
+}, {
+ // encoded non-uri-reserved char "*" in significant place
+ route: "/foo/%2Abar",
+ matches: ["/foo/*bar", "/foo/%2Abar", "/foo/%2baar"]
+ */
+}, {
+ // unencoded "*" in non-significant place
+ route: "/foo/b*ar",
+ matches: ["/foo/b*ar", "/foo/b%2Aar", "/foo/b%2aar"]
+}, { |
@bantic lets require all reserved characters in the configured route to be encoded so we can use any of them for future parsing needs. |
Otherwise this looks good to me. |
Sounds good, but I think we need to delineate which characters are in the reserved set. The uriReserved set are these: How about:
There is also mention of reserving And then, from the perspective of path-matching, paths with either unencoded or encoded form of characters in the reserved set will match, e.g.:
The
@krisselden @mmun Let me know if that is in line with what you had in mind. |
This would also mean it's not possible to match a literal E.g., there would be no route that could be added such that the path |
@bantic that sounds like a feature to me! |
Pushed some WIP to address the comment above about reserved characters and to address discussion on #dev-router in slack about normalizing route/path segments rather than full strings. This breaks the test that ensures that a glob route doesn't modify its matched path, though, so I need to fix that. Also need to make a decision which characters are "reserved" and how to communicate when user attempts to add a route with unencoded reserved chars:
I made a visualizer that can be used to see what paths will match a given route (and what their param values will be): http://rr-visualizer.surge.sh/ |
f67c9b6
to
3383514
Compare
Change State's `names` array to hold {name, decode} objects instead of only the names. Recognition uses the `decode` boolean to know whether to decode the capture.
3383514
to
8f7845b
Compare
Performance on master (2a21d5e):
Performance on this branch (5bf1311):
|
This is ready for review. Here's a short summary of the changes: Normalization when adding routes and when recognizing pathsAdded routes are normalized and paths are normalized before recognition so that percent-encoded characters in routes are now mostly normalized and users do not need to add the same route in multiple forms to deal with all possible URLs that it might encounter. Before: "/foo%3Abar", "/foo%3abar" and "/foo:bar" were all considered different routes. Bugfix: url generation with dynamic segment that has special characters (like "/")Old behavior was to put the parameter into the url as-is. New behavior is to Before: route "/post/:id" with id "abc/def" would generate the url "/post/abc/def", which no longer would be matched by the route that generated it. Bugfix: inconsistent decoding of dynamic segments during recognitionOld behavior was to sort-of-normalize an incoming path using Before: a route "posts/:id" would match the url "posts/abc%3Bdef" with the non-decoded id value "abc%3Bdef", but match the url "posts/abc%20def" with the decoded id value "abc def". Now: the matched parameter value is always decoded (for dynamic segments), and the following symmetric relationship is always true:
In english: after adding a route to RR, RR will always Problematic edge case in the old behavior: If a url had a percent-encoded percent character (the literal string "%25"), the matching param would have the decoded value ("%"), and consumers that use Bugfix / Behavior clarification: catchall (aka glob or star segment) routes do not decode their matching paramsThe docs did not specify how/if star segments should decode parameters, and so they had the same inconsistent behavior for dynamic segments mentioned above in this comment. This PR adds some tests (and code) to clarify that star segment matches are not decoded. Old behavior: route with "/*catchall" would match url "/abc%20def" with Feature Flag: ENCODE_AND_DECODE_PATH_SEGMENTSAdded Not covered: Reserved characters in routesThe comments above mention enforcing that certain characters (like
|
The RR visualizer at http://rr-visualizer.surge.sh/ is updated with these changes. |
Rewrote the normalization algorithm and added some benchmarks for it. Performance on master (2a21d5e):
Performance on this PR (0861ce2):
New benchmarks for normalization:
|
@bantic can you have the route visualizer show the segments + canonical values? |
(fwiw @bantic is mostly offline for this week) |
I'm now refactoring my code in #90 to pass these new tests. |
👍 @mmun sorry for the delay. I'll update the visualizer today or tomorrow |
@mmun Updated the visualizer with normalized route segments and normalized path: http://rr-visualizer.surge.sh/ |
* FIX: category routes model params should decode their URL parts Ember's route star globbing does not uri decode by default. This is problematic for subcategory globs with encoded URL site settings enabled. Subcategories with encoded URLs will 404 without this decode. I found this tildeio/route-recognizer#91 which explicitly explains that globbing does not decode automatically.
This PR supersedes #55. That PR originally attempted to encode/decode dynamic segments in matched paths, but it grew much larger when I attempted to make the solution robust in the face of dynamic segments with percent-encoded percent characters (
encodeURI("%") === "%25"
).I'm approaching this PR in the spirit of an RFC, with a very detailed rationale for the changes below. If there is an alternative preferred approach for this large-ish change I'm happy to restructure or break apart this PR as necessary.
cc @krisselden
Changes to route-recognizer
Note about terminology:
router.add([{ path: '/foo/bar', ... }])
is/foo/bar
)router.recognize('/boo/baz')
is/boo/baz
)/foo/:bar
would "match" the path/foo/something
)What is covered
This PR addresses the following:
Who would be affected
It may only affect users who have routes with:
/
)It may cause breaking changes for users who have workarounds in place to encode/decode before generation/recognition. The breakage can be fixed by removing the workaround.
It will likely not cause breaking changes for users who have static routes with percent-encoded or unicode characters. All static route/path combinations that currently match will also match after this PR. Static routes that used to be considered distinct (e.g.
/f%20o
and/f o
) are now normalized to the same thing. Users with affected static route/path combos likely need to do nothing; they may simply end up with superfluous routes that can be removed.Route and Path Normalization
The route-recognizer docs do not have a lot to say about routes with non-ascii and percent-encoded characters, so some of the work of this change is to add tests for many possible routes and expected matches that may exist in the wild.
The changes in this PR are intended to make the route-recognizer more likely to match routes that are added using either percent-encoded or non-encoded characters, and to recognize paths correctly whether they include percent-encoded characters or not.
The normalization added is intended to make the router recognize routes that:
/café
)/caf%C3%A9
)/foo bar
,/foo:bar
)/foo%20bar
,/foo%3Abar
)router.recognize('caf%C3%A9')
)router.recognize('café')
)router.recognize('foo bar')
,router.recognize('foo:bar')
)router.recognize('foo%20bar')
,router.recognize('foo%3Abar')
)This adds a normalization step for routes when they are added (before they are parsed into segments) and paths when they are recognized.
Similar to the way one would lower-case two strings to check for case-insensitive equality, this normalization is intended to improve the recognition of
routes by ensuring there is a higher degree of similarity between routes and paths. All percent-encoded unicode characters and most percent-encoded special url characters
are normalized into equal strings, whether they are added as routes or recognized as paths.
Normalization Steps:
The normalization for routes and paths is the same:
decodeURI
the string ignoring percent-encoded percent (%
) characters (i.e., only ignore the literal value:%25
). This prevents an issue with encoded percent characters in dynamic segments, where they may be inadvertently doubly-decoded resulting in a URIError./café
or/caf%C3%A9
will be decoded to/café
and will recognize the path/café
or/caf%C3%A9
%3a
->%3A
). That way these literal encoded sequences will still be matched. A route added as"/fo%3a"
will be matched by incoming paths"/fo%3a"
and"/fo%3A"
Normalization of URI-reserved characters
Characters in the
uriReserved
set (; / ? : @ & = + $ ,
) are not encoded byencodeURI
, and their percent-encodings (which should not be used in URLs, unless they are part of a dynamic segment) are likewise not decoded bydecodeURI
(e.g.,encodeURI(':') === ':'
anddecodeURI('%3A') === '%3A'
).Since some of these characters have special meaning to the router dsl (
:
at the start of a segment) and to URLs (?
indicating query params), they need to be encoded when adding them as routes (e.g., to add a route with the literal value ":" at the start of a segment it must be encoded:router.add([{ path: "/foo/%3Abar" ...
).Example affected static routes
Examples of static routes that would be affected by this change:
/fo%20
/fo
/fo%20
/fo%20
/fo%20
decodeURI
s path (to/fo
) before recognizing against route/fo%20
/fo%3a
/fo%3A
/fo%25
/fo%25
%
incorrectly decoded by old code/caf%C3%A9
/café
/caf%C3%A9
/caf%C3%A9
decodeURI
path to/café
, which doesn't matchExample unaffected static routes
Examples of static routes that would not be affected by this change:
/café
/caf%C3A9
decodeURI
s the path (to/café
) so it will be recognized/fo:
/fo:
/fo
/fo%20
decodeURI
s the path (to/fo
) so it is recognized/fo
/fo
/fo%3A
/fo%3A
decodeURI('%3A') === '%3A'
)/fo%3A
/fo:
%3A
and:
in route and path, respectively/café
/café
Dynamic Route Generation (bugfix)
Changes generation to call
encodeURIComponent
on dynamic route segments.Previously, the parameter would be interpolated into the generated route unchanged.
This should be considered a bugfix because the previous behavior made it
possible for the router to generate a route that it later wouldn't be able to recognize.
It is a breaking bugfix for users that have workarounds in place to
encodeURIComponent
their parameters before generating links from them. Those workarounds would lead to
doubly-encoded parameters in URLs.
Example route generation
Given a route with a dynamic segment:
"/post/:id"
, the following values ofid
willgenerate the routes shown.
id
valueabc/def
/post/abc/def
/post/abc%2Fdef
100%
/post/100%
/post/100%25
Dynamic Route Parameter Decoding (bugfix)
Changes route recognition to call
decodeURIComponent
onrecognized parameters from dynamic route segments.
Previously, the captured parameters would be returned unmodified. However, before
dynamic parameters are parsed, the
path
passed torouter.recognize
would be"normalized" with
decodeURI
(to handle non-ascii unicode characters), so thecaptured parameter was not always the same as the value in the
path
. See below.The route-recognizer's docs do not discuss the expected behavior in these scenarios.
This change brings the behavior into closer alignment with behavior of other well-known
routers like the Rails router. As such, it could be considered either an enhancement
or a bugfix.
This is a breaking change for users that are working around this issue by calling
decodeURIComponent
on the params returned by the route. With the new code, this could result in unexpected values caused by double-decoding the value.Example dynamic route parameter decoding
Given a route with a dynamic segment:
"/post/:id"
, the following paths will yieldthe shown values for the
id
param.id
paramid
param/post/abc%def"
abc%2Fdef
abc/def
post/café
café
café
post/caf%C3%A9
café
café
decodeURI(path)
decodes the percent-encoded%C3A9
toé
. A user with a workaround would be unaffected.post/%3A1
%3A1
:1
decodeURI
does not decode%3A
to:
. A user with a workaround would be unaffected.post/100%25
100%
100%
decodeURI(path)
decodes%25
to%
. A user with a workaround in place would get an error when trying todecodeURIComponent("100%")
Special case: doubly-encoded segments
The incorrect decoding of encoded percent characters also affects dynamic segments that have been mulitply encoded.
For example, if the post id is a url that itself has a
encodeURIComponent
-encoded parameter, such as:The reserved characters from "http://other-url.com" will be doubly-encoded in
encodedUrl
. E.g., the//
inhttp://other-url.com
is first encoded as%2F%2F
,and then the
%
characters are encoded again to%25
:encodeURIComponent(encodeURIComponent("//")) === "%252F%252F"
.The old code would call
decodeURI
once onencodedUrl
, resulting in the following (incorrect) value for id:id === encodeURIComponent("http://example.com/post/http://other.url")
.The new code avoids decoding the percent characters in the initial path, and it results in the following (correct) value for id:
id === "http://example.com/post/" + encodeURIComponent("http://other.url")
.Glob Routes (Star Segments)
This PR adds some tests and code to ensure that glob routes (e.g.
/*catchall
) do not decode their matching segments.E.g., given the catchall route and path
/abc/foo%2Fbar
, the matched paramcatchall
isabc/foo%2Fbar
, notabc/foo/bar
.