-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not encode chars that are allowed in path segments #109
Do not encode chars that are allowed in path segments #109
Conversation
@krisselden and/or @wycats will need to review |
Sounds good. For reference, the ABNF for path segments in RFC 3896 looks like this:
Prior to #91, dynamic segments in routes (e.g. the ":post_id" in But some of the characters that should be fine to appear in a path segment unencoded (like Link to path section of RFC 3896. |
I ran the benchmarks before and after this change and the generation code is slightly slower (2.9M -> 2.7M ops on my machine), but otherwise the benchmarks do not appear to be too heavily impacted. |
FWIW - This looks good to me. |
if (separators.length) { | ||
ret += separators.shift(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do this without modification of the arrays and without an if
which I believe will be faster:
var ret = '';
separators.push('');
for (var i = 0; i < pieces.length; i++) {
ret = ret + pieces[i] + separators[i];
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Thanks. I made the change.
This is an implementation of the path encoding semantics for paths from RFC 3896 for maximum correctness while minimizing the number of characters encoded. I've left one comment in the implementation addressing perf, after which it looks good to me for landing and a bump to 0.2.3 (also to land upstream in Ember for the next beta release). |
During route generation for dynamic segments, do not percent-encode characters that, according to RFC 3986, are allowed in path segments. RFC 3986 defines "sub-delims" (these chars: `! $ & ' ( ) * + , ; =`) and specifies that they along with `:` and `@` do not need to be encoded in path segments in URIs. See: https://tools.ietf.org/html/rfc3986#section-3.3 This commit changes RouteRecognizer'ss generation code for dynamic segments to explicitly avoid encoding those characters. Fixes emberjs/ember.js#14094
de95066
to
17f5dad
Compare
@nathanhammond Thanks, I addressed the feedback you mentioned. 👍 |
This halves
In this case I want to trade correctness for performance given that this is a rounding error in the total time cost to render a link inside of Ember. A faster approach to normalization may in the future involve an NFA or DFA... |
Lot of bloat a simpler solution is to encode once then replace with a regex matching the encoded chars with the unescape fn |
That's true. I also tried these two approaches but they were slightly worse for performance and much worse for readability (imo). But the snippets below both pass all tests and are much terser; I can pr one of them if that's preferred. generate regex: var reservedSegmentChars = [
"!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", // sub-delims
":", "@" // others explicitly allowed by RFC 3986
];
// the encoded versions of the chars, filtering out ones for which `encodeURIComponent` has no effect
var encodedSegmentChars = reservedSegmentChars.map(encodeURIComponent).filter(function(encoded) {
return encoded.length === 3 && encoded[0] === '%';
});
var encodedSegmentCharRegex = new RegExp(encodedSegmentChars.join('|'), 'g');
function encodePathSegment(segment) {
segment = '' + segment; // coerce to string
return encodeURIComponent(segment).replace(encodedSegmentCharRegex, function(match) {
return decodeURIComponent(match);
});
} hand-tuned regex: // the escaped versions of the characters we care about are:
// %24, %26, %2B, %2C, %3A, %3B, %3D, %40
var encodedSegmentCharRegex = /%((2(4|6|B|C))|(3(A|B|D))|(40))/g;
function encodePathSegment(segment) {
segment = '' + segment; // coerce to string
return encodeURIComponent(segment).replace(encodedSegmentCharRegex, function(match) {
return decodeURIComponent(match);
});
} @nathanhammond Running the benches again you're right, my results match yours for perf of |
|
With the smaller file size and 33% perf win I think that we should opt for this:
@bantic Would you mind PRing since it's your code (plus @krisselden's regex for non-capturing)? |
@nathanhammond Yes, happy to — Kris pointed out on slack that this regex is faster (it doesn't have capturing groups): I have to head out now but I will PR a change later tonight |
Port changes from tildeio/route-recognizer#109
This approach is much simpler than walking the string, and appears to be at least as performant as the previous. Refs tildeio#109
@nathanhammond remember when the capture isn't being used to use |
During route generation for dynamic segments, do not percent-encode
characters that, according to RFC 3986, are allowed in path segments.
RFC 3986 defines "sub-delims" (these chars:
! $ & ' ( ) * + , ; =
) andspecifies that they along with
:
and@
do not need to be encoded in path segmentsin URIs. See: https://tools.ietf.org/html/rfc3986#section-3.3
This commit changes RouteRecognizer'ss generation code for dynamic
segments to explicitly avoid encoding those characters.
Fixes emberjs/ember.js#14094