-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url: add urlSearchParams.sort() #11098
Conversation
@mscdex ... I would appreciate you taking a look at this from a perf point of view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there
with a [stable sorting algorithm][], so relative order between name-value pairs | ||
with the same name is preserved. | ||
|
||
This method can be used, in particular, to increase cache hits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example may be helpful here.
lib/internal/url.js
Outdated
} | ||
|
||
// arbitrary number found through testing | ||
if (len < 118) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look to benchmark this on a number of different systems to see if this number holds up.
const {URL, URLSearchParams} = require('url'); | ||
|
||
[ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also include a test with an empty key? e.g. z=a&=b&c=d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting bit looks like C++ code itself really, and it creates two holey arrays, so probably worth just implement those in C++?
lib/internal/url.js
Outdated
|
||
update(this[context], this); | ||
|
||
function merge(out, start, mid, end, lBuffer, rBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be moved outside of sort
@joyeecheung said:
Just tried doing something like that: C++ diffdiff --git a/src/node_url.cc b/src/node_url.cc
index 0d5e695..3ab7108 100644
--- a/src/node_url.cc
+++ b/src/node_url.cc
@@ -1389,6 +1389,49 @@ namespace url {
v8::NewStringType::kNormal).ToLocalChecked());
}
+ static void SortParams(const FunctionCallbackInfo<Value>& args) {
+ Environment* env = Environment::GetCurrent(args);
+ CHECK_GE(args.Length(), 1);
+ CHECK(args[0]->IsArray());
+
+ Local<Array> src = args[0].As<Array>();
+ uint32_t len = src->Length();
+
+ std::vector<std::string> a;
+ Copy(env, src, &a);
+ std::vector<std::string> tmp(len);
+
+ for (uint32_t step = 2; step < len; step *= 2) {
+ for (uint32_t start = 0; start < len - 2; start += 2 * step) {
+ uint32_t mid = start + step;
+ uint32_t end = mid + step;
+ end = end < len ? end : len;
+ if (mid > end)
+ continue;
+
+ uint32_t l = start;
+ uint32_t r = mid;
+ uint32_t t = start;
+ while (l < mid && r < end) {
+ if (a[l] <= a[r]) {
+ tmp[t++] = a[l++];
+ tmp[t++] = a[l++];
+ } else {
+ tmp[t++] = a[r++];
+ tmp[t++] = a[r++];
+ }
+ }
+ while (l < mid)
+ tmp[t++] = a[l++];
+ while (r < end)
+ tmp[t++] = a[r++];
+ }
+ tmp.swap(a);
+ }
+
+ args.GetReturnValue().Set(Copy(env, a));
+ }
+
static void Init(Local<Object> target,
Local<Value> unused,
Local<Context> context,
@@ -1398,6 +1441,7 @@ namespace url {
env->SetMethod(target, "encodeAuth", EncodeAuthSet);
env->SetMethod(target, "domainToASCII", DomainToASCII);
env->SetMethod(target, "domainToUnicode", DomainToUnicode);
+ env->SetMethod(target, "sortParams", SortParams);
#define XX(name, _) NODE_DEFINE_CONSTANT(target, name);
FLAGS(XX) It's much slower than the JS version: in fact, the two |
@jasnell said:
Agreed. I've made a simple benchmark specifically for this purpose. I'm unfortunately not too well-versed in data analysis languages, so my method of measuring is rather crude. A regression model is probably better-fit for this purpose. How to run# create `merge` and `insertion` binaries, with insertion and merge sort blocks resp.
# commented out
# wait. it's gonna run for a while (~16 minutes)
$ node benchmark/compare.js --old ./insertion --new ./merge --filter rrrr --runs 10 url > sort.csv
$ Rscript benchmark/compare.R < sort My results
|
ba709b9
to
7cbbea6
Compare
Added Web Platform Tests and rebased. New CI: https://ci.nodejs.org/job/node-test-pull-request/6186/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there.
doc/api/url.md
Outdated
params.sort(); | ||
console.log(params.toString()); | ||
// Prints query%5B%5D=123&query%5B%5D=abc&type=search | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to swap the query[]=123
and query[]=abc
positions to show that the sort will not affect the ordering of the abc
and 123
positions given that 123
would typically appear before abc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
@mscdex, have you had a chance yet to look at this? Frankly, I'm more than happy with its performance, as my makeshift insertion/merge sort hybrid is able to outperform |
lib/internal/url.js
Outdated
} else { | ||
// Bottom-up iterative stable merge sort | ||
const lBuffer = Array(len); | ||
const rBuffer = Array(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @targos.
@TimothyGu There is a cost of memory though, |
// 8 elements | ||
short: 'm&t&d&c&z&v&a&n', | ||
// 88 elements | ||
long: 'g&r&t&h&s&r&d&w&b&n&h&k&x&m&k&h&o&e&x&c&c&g&e&b&p&p&s&n&j&b&y&z&' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be 88 parameters because only unique keys are kept by querystring.parse()
. One suggestion is to use unique keys first:
long: 'g&r&t&h&s&rr&d&w&b&n&hh&k&x&m&kk&hhh&o&e&xx&c&cc&gg&ee&bb&' +
'p&pp&ss&nn&j&bbb&y&z&u&l&oo&rrr&ww&a&uu&ll&mm&f&jj&q&ppp&ff&' +
'eee&yy&eeee&nnn&eeeee&lll&mmm&www&uuu&wwww&tt&nnnn&ttt&qq&v&' +
'yyy&ccc&ooo&kkk&fff&jjj&i&llll&mmmm&ggg&jjjj&dd&ii&zz&qqq&pppp&' +
'xxx&qqqq&qqqqq&ddd&nnnnn&yyyy&wwwww&gggg&iii&vv&rrrr'
and then add this before the timed loop to create duplicates:
if (conf.type === 'long') {
// Make `array` contain duplicate keys, useful for benchmarking stable-ness
// of sorting
for (i = 0; i < array.length; i += 2) {
if (array[i].length > 1)
array[i] = array[i][0];
}
}
This should trigger the current alternate code path that uses merge sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but querystring.parse returns something like { g: [ '', '' ] }
in this case, and this object is converted to [ 'g', '', 'g', '' ]
. Either way, I added a small ad-hoc parser to skip querystring.parse
, and to also maintain the order of the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just a bug in the current querystring.parse()
in master. For example, querystring.parse('a&a&a&a')
currently returns { a: [ '', '' ] }
instead of { a: [ '', '', '', '' ] }
. This bug is probably due to 4e259b2 which may/may not be fixed by #11171. At any rate, using unique keys in the beginning will avoid such possible bugs.
almostsorted: 'a&b&c&d&e&f&g&i&h&j&k&l&m&n&o&p&q&r&s&t&u&w&v&x&y&z', | ||
reversed: 'z&y&x&w&v&u&t&s&r&q&p&o&n&m&l&k&j&i&h&g&f&e&d&c&b&a', | ||
random: 'm&t&d&c&z&v&a&n&p&y&u&o&h&l&f&j&e&q&b&i&s&x&k&w&r&g', | ||
// 8 elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better name here would be 'search parameters' or similar. The term 'elements' can be confusing if you're familiar with the underlying URLSearchParams
implementation (e.g. does 'elements' mean number of pairs or params[searchParams].length
?).
|
||
const bench = common.createBenchmark(main, { | ||
type: Object.keys(inputs), | ||
n: [1e5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a little low IMHO. The results came back awfully fast on my machine, which could mean V8 didn't have enough time to properly optimize functions. For example, I used ~5e6
when testing type=long
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 1e6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the results come back too fast there is a possibility that this got loop-invariant code-motioned. I usually get alarmed when I see op/s come close to the op/s of an empty loop on my machine.
lib/internal/url.js
Outdated
var j; | ||
for (j = i - 2; j >= 0; j -= 2) { | ||
var tmpKey = a[j]; | ||
var tmpVal = a[j + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove these two temporary variables (use the values as-is where needed), you will get a small perf boost.
Without diving into the multitude of sort algorithms out there to refresh my memory, these changes are probably fine, except for a few minor nits. One unrelated nit I noticed is that because the whatwg URL module uses |
I should also say I'm a bit confused about the "cutoff" being 59. In the code the conditional is |
@mscdex, yes I changed the cutoff after moving |
For most of the cases (when there are fewer than 50 queries), the in-place selection sort is used. For >= 50, merge sort at most makes two more copies of the params array, so O(n) space complexity. (It is also the algorithm used by SpiderMonkey's |
9945032
to
c0ea7d0
Compare
You mean O(2n)? |
Coefficients are irrelevant in Big O notation 😺 |
const searchParams = require('internal/url').searchParamsSymbol; | ||
const input = inputs[conf.type]; | ||
const n = conf.n | 0; | ||
const params = new URLSearchParams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO ()
should be added after the constructor name.
2c925d2
to
54dde7e
Compare
Rebased. Going to apply on Monday if there are no more objections. |
PR-URL: #11098 Fixes: #10760 Ref: whatwg/url#26 Ref: whatwg/url#199 Ref: web-platform-tests/wpt#4531 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11098 Fixes: #10760 Ref: whatwg/url#26 Ref: whatwg/url#199 Ref: web-platform-tests/wpt#4531 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Backport-of: nodejs#11098 Fixes: nodejs#10760 Ref: whatwg/url#26 Ref: whatwg/url#199 Ref: web-platform-tests/wpt#4531
Backport-of: nodejs#11098 Fixes: nodejs#10760 Ref: whatwg/url#26 Ref: whatwg/url#199 Ref: web-platform-tests/wpt#4531
Backport-of: #11098 Fixes: #10760 Ref: whatwg/url#26 Ref: whatwg/url#199 Ref: web-platform-tests/wpt#4531
The URL Standard requires
sort()
to be stable, which precludes us from using the V8-nativesort()
function.I originally wanted to make the function a simple insertion sort only, but I realized it had the potential for a DoS attack because of its nature of being O(n2).
The algorithm for the function follows the norm: insertion sort for small arrays, divide-and-conquer (in this case, a stable merge sort) for larger arrays. In this specific case, the cutoff is set at 59 items, though if performance improvements are made to either sort the cutoff may be adjusted.
Fixes: #10760
Ref: whatwg/url#26
Ref: whatwg/url#199
Ref: web-platform-tests/wpt#4531
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
url