-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: use icu's punycode implementation #7355
Conversation
does the binary size change much? i don't think it should. |
With the code, no. Not sure about the data files. I need to track down
|
Maybe uts46? maybe the stringprep dir? https://github.com/nodejs/node/blob/master/tools/icu/icu_small.json#L37 |
Yeah, likely both. I'll play around with adding those into the small-icu subset and see what happens ;-) |
|
||
if (status == U_BUFFER_OVERFLOW_ERROR) { | ||
status = U_ZERO_ERROR; | ||
dest = static_cast<char*>(malloc(len)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll at least want to do a CHECK_NE(dest, nullptr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, use MaybeStackBuffer
.
High-level comment: is punycode prevalent enough to warrant optimizing for? I'm inclined to say it isn't. |
@bnoordhuis Maybe this is what you meant, but the function is called every time
|
Let me rephrase: does |
@bnoordhuis ... it's not used extensively by our code, no, but it is used pretty much every time Ultimately what I'd really like is for us to deprecate the |
@bnoordhuis ... to give an idea, below is a comparison of url.parse benchmarks in v6.2.2 against master with this patch applied:
The "five" case is an exception and I believe that something else unrelated to this change is going on there.. but the perf boost for the typical use cases is significant. |
@bnoordhuis @saghul @trevnorris @srl295 .. updated! PTAL |
} | ||
|
||
if (U_FAILURE(status)) { | ||
env->ThrowError("Cannot convert name to ASCII"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this give a simple yes/no on failure, or is there additional information that may be useful to the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case the possible failures are rare and non-interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you can test for info.errors & UIDNA_ERROR_DOMAIN_NAME_TOO_LONG
, maybe other values. The status
itself is likely to be something like out of memory, invalid parameters, etc. You could ThrowError
including u_errorName(status)
to return the ICU status code - that would be good. I don't know if ThrowError
takes a parameter.
what a pain. so // url.parse() automatically runs toASCII() on the
// domain, but isn't documented to do so.
const o = url.parse('http://見.香港/見');
o.pathname = encodeURI(o.pathname);
const url = url.format(o); Anyway, not directly anything to do with this PR. In terms of performance, I think Strictly from the code perspective, the updates LGTM. |
This only needs full-icu currently. There are a couple of minor data files currently missing in the small-icu subset that would need to be added but those are minimal. Those can be added to small-icu without too much impact. |
And I'm working on a better URL impl to deal with the other issues ;-) |
How? |
@mathiasbynens By playing to V8's strengths while avoiding its weaknesses. :-) For example, .toASCII() calls String#split() twice and is heavy on the map-over-substrings idiom. It would probably be faster to:
There is probably more but that is what stood out from a quick look. |
} | ||
} | ||
|
||
const {toASCII} = importPunycode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a spacing rule for destructuring before we start using it in full? // @Trott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places we have spaces around the name.
CI had a failure on smartos... green everywhere else. Small tweak and running again: https://ci.nodejs.org/job/node-test-pull-request/3108/ |
@trevnorris @bnoordhuis @srl295 ... CI is good. This now no-longer requires full-icu data. PTAL |
|
||
if (len < 0) { | ||
env->ThrowError("Cannot convert name to ASCII"); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be able to put these two on the same line. git grep "return.*Throw"
shows that's how it's done in 260 other places.
LGTM with style nits. |
@bnoordhuis @trevnorris .. updated to address the nits |
ICU has a punycode implementation built in. Use it instead of the javascript implementation because it's much faster.
Final CI before landing: https://ci.nodejs.org/job/node-test-pull-request/3134/ |
CI is green. |
PR-URL: #7355 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
ICU has a punycode implementation built in. Use it instead of the javascript implementation because it's much faster. PR-URL: #7355 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Landed in 7de59ef and 3d6a01e |
PR-URL: #7355 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
ICU has a punycode implementation built in. Use it instead of the javascript implementation because it's much faster. PR-URL: #7355 Reviewed-By: Trevor Norris <trev.norris@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@thealphanerd Correct and correct. |
Checklist
make -j4 test
(UNIX) orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
net, intl
Description of change
ICU has a punycode implementation built in. Use it instead of the javascript implementation because it's much faster.
The punycode module is currently only used in one location in core (in the
url
module). With this change if ICU is present, the ICU implementation would be used with thepunycode
impl becoming the fallback. Eventually, it would be great to see if we could deprecate thepunycode
module entirely, with an eye towards eventually removing it.This includes an update to the small-icu dataset to support the IDNA implementation. The additional data adds < 1 mb of data.
Benchmarks (bigger numbers are better):
cc @srl295 @mscdex @ChALkeR @trevnorris