Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perhaps do not apply ToASCII for ASCII-only input #267

Closed
domenic opened this issue Mar 8, 2017 · 5 comments
Closed

Perhaps do not apply ToASCII for ASCII-only input #267

domenic opened this issue Mar 8, 2017 · 5 comments

Comments

@domenic
Copy link
Member

domenic commented Mar 8, 2017

See https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html

The possible solutions are:

  • Browsers get stricter and apply ToASCII anyway
  • ToASCII's definition changes to not do any validation on ASCII-only input
  • This spec only invokes ToASCII on on non-ASCII input
@TimothyGu
Copy link
Member

TimothyGu commented Mar 9, 2017

FYI, Node.js after nodejs/node#11549 (released in v7.7.2) errors out on all test cases in https://annevankesteren.nl/2017/02/idna-toascii-differences.

@annevk
Copy link
Member

annevk commented May 18, 2017

FYI: @rmisev discovered some issues with ASCII fast paths: #309 (comment). An ASCII fast path is probably still an acceptable intermediate solution though, but I'd rather define something better in the standard.

@annevk
Copy link
Member

annevk commented May 18, 2017

So my recommendation is that we go with a fourth solution:

  • ToASCII definition changes to do less validation on all input

In particular make use of the new CheckHyphens proposed in http://www.unicode.org/reports/tr46/tr46-18.html#ToASCII. I'm not entirely sure how to update the standard though potentially we could refer to the proposed version. I think we've done that in the past as well.

I filed these bugs (I hope someone can help with Node.js, they seem well represented):

I wrote these tests (quite a few from @rmisev): web-platform-tests/wpt#5976.

@TimothyGu
Copy link
Member

I hope someone can help with Node.js, they seem well represented

Sure 😆

To be clear, am I correct in saying this decision means:

  <li><p>Let <var>result</var> be the result of running <a abstract-op lt=ToASCII>Unicode ToASCII</a> with
  <i>domain_name</i> set to <var>domain</var>, <i>UseSTD3ASCIIRules</i> set to false,
- <i>processing_option</i> set to <i>Nontransitional_Processing</i>, and <i>VerifyDnsLength</i> set
- to false.
+ <i>processing_option</i> set to <i>Nontransitional_Processing</i>, <i>VerifyDnsLength</i> set
+ to false, and <i>CheckHyphens</i> set to false.

If so, we have nodejs/node#12966 already which would fulfill this.

@annevk
Copy link
Member

annevk commented May 19, 2017

Yeah, we might also make CheckJoiners false, but that's a separate discussion.

annevk added a commit that referenced this issue May 24, 2017
Fixes #53 and fixes #267 by no longer breaking on on hyphens in the
3rd and 4th position of a domain label. This is known to break
YouTube: r3---sn-2gb7ln7k.googlevideo.com. This is done by setting
the proposed CheckHyphens flag to false.

Fixes #110 by clarifying that BIDI and CONTEXTJ checks are to be done
by setting the proposed CheckBidi and CheckJoiners flags to true.

Follow-up #313 is filed to remove the proposed bits once Unicode is
updated.
annevk added a commit that referenced this issue May 24, 2017
Tests: web-platform-tests/wpt#5976.

Fixes #53 and fixes #267 by no longer breaking on on hyphens in the
3rd and 4th position of a domain label. This is known to break
YouTube: r3---sn-2gb7ln7k.googlevideo.com. This is done by setting
the proposed CheckHyphens flag to false.

Fixes #110 by clarifying that BIDI and CONTEXTJ checks are to be done
by setting the proposed CheckBidi and CheckJoiners flags to true.

Follow-up #313 is filed to remove the proposed bits once Unicode is
updated.
@domenic domenic closed this as completed in dc9d831 Jun 1, 2017
TimothyGu added a commit to TimothyGu/url that referenced this issue May 13, 2021
Many implementations currently skip ToASCII if domain is ASCII-only, but
as discovered in [1] and [2], this can result in some undesirable
behavior. Adding a note prevents implementors from making the mistake of
thinking ToASCII is a no-op if the input is ASCII, and also provides a
recommendation on how to properly optimize the ToASCII step.

[1]: whatwg#267
[2]: whatwg#309 (comment)
TimothyGu added a commit to TimothyGu/url that referenced this issue May 13, 2021
Many implementations currently skip ToASCII if domain is ASCII-only, but
as discovered in [1] and [2], this can result in some undesirable
behavior. Adding a note prevents implementors from making the mistake of
thinking ToASCII is a no-op if the input is ASCII, and also provides a
recommendation on how to properly optimize the ToASCII step.

[1]: whatwg#267
[2]: whatwg#309 (comment)
TimothyGu added a commit to TimothyGu/url that referenced this issue May 19, 2021
Many implementations currently skip ToASCII if domain is ASCII-only, but
as discovered in [1] and [2], this can result in some undesirable
behavior. Adding a note prevents implementors from making the mistake of
thinking ToASCII is a no-op if the input is ASCII, and also provides a
recommendation on how to properly optimize the ToASCII step.

[1]: whatwg#267
[2]: whatwg#309 (comment)
annevk pushed a commit that referenced this issue May 19, 2021
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in #267 and #309 (comment), this can result in some undesirable behavior. Adding a note prevents implementers from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants