-
-
Notifications
You must be signed in to change notification settings - Fork 34k
src: do not ignore IDNA conversion error #11549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hopefully the issue with legacy url parser is fixed. /cc @nodejs/intl @nodejs/url New CI: https://ci.nodejs.org/job/node-test-pull-request/6586/ |
doc/api/url.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, should this be deserialization, and mention that it is the inverse of domainToASCII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is serialization, since the domain is fully parsed and subsequently serialized from the parsed form. It's just that it uses a different algorithm for deserialization.
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the args.Length() check above to use 2? Also, you probably want to add a CHECK(args[1]->IsBoolean()); or do args[1]->BooleanValue() instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't update the check for argument length, since (as the comment is trying to say) it is an optional argument, so that existing usage of toUnicode(str) would still work. V8 automatically returns an Undefined for out-of-range args[] dereference.
Wasn't aware of BooleanValue(). Will use that instead.
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ditto)
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error part of any non-experimental API? Could we change it to Cannot encode name to ASCII as Punycode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for toASCII
> url.parse(`http://${'é'.repeat(230)}.com/`)
Error: Cannot convert name to ASCII
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error part of any non-experimental API? Could we change it to Cannot decode name as Punycode? (basically the same question I also posted below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; in fact the toUnicode JS function isn't used in the code base at all. Maybe we should just remove this method?
/cc @jasnell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used, it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove which function specifically? The `i18n::ToUnicode' function is definitely used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell, the exposed process.binding('icu').toUnicode() JS function.
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this compile? Seems like the env->context() argument is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@addaleax, you are right. Forgot to push fde77b3
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used, it can be removed.
joyeecheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also fix the missing errors when parsing percent-encoded disallowed characters in hosts(https://github.com/nodejs/node/blob/master/test/fixtures/url-tests.js#L4499) since we are no longer ignoring UIDNA_ERROR_DISALLOWED, you can turn them on in this PR if you like.
|
@jasnell, did you see #11549 (comment)? |
Old behavior can be restored using a special `lenient` mode.
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests
|
Test re-enabled per @joyeecheung. Will land tomorrow. |
|
Landed in a520508...7ceea2a. |
Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Currently, the ICU-based IDNA conversion methods only return errors on those passed along through a
UErrorCode. However, according to ICU's documentation foruidna_nameToASCII(),In other words, when non-catastrophically invalid domains are passed,
ToASCII()andToUnicode()(and their downstreamurl.domainToASCII()andurl.domainToUnicode()) currently return garbled domain names instead of errors.This PR makes the C++ binding methods report errors in
pInfo->errorsin addition toUErrorCode, thereby fixing those aforementioned problems.Also included in this PR are additional tests for invalid situations as well as documentation clarifications for the user-facing
url.domainToASCII()andurl.domainToUnicode().Before vs. after
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)