-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idna-2.2: idna.encode('☃') does not return 'xn--n3h' #40
Comments
Oops. It's not IDNA2008: http://unicode.org/cldr/utility/character.jsp?a=2603 closing. |
It's valid uts46 though which is what browsers use. You might want to reconsider this. |
It's not valid UTS46 for IDNA 2008, it is only valid for IDNA 2003. Look for the "NV8" in the UTS46 table data. Now there may be an argument to add fall-through IDNA 2003 processing, but as of today this library only supports IDNA 2008. |
+1 for optional 2003 fall-through, I run into real web corner cases that are IDNA2003. |
@kjd UTS46 (the actual document) doesn't have a processing mode where this code point is somehow rejected and this is the first implementation I have seen that does such a thing. |
That's because this isn't an implementation of UTS46, it's an implementation of IDNA2008. If for some reason you want only UTS46 and not IDNA2008 then presumably you can call |
idna.uts46_remap doesn't encode anything though... |
|
Thanks! Stick that function in the library ;) So is this a bona fide implementation of uts46? (Sorry for still not being totally clear on what the spec entails) |
Doh.
|
This was just a quick function I rattled off the top of my head, not tested. You probably need to do a few more lines to break down the input string into individual labels to use the idna2003 portion. If we added support to do something like this in this library (see issue #18) then it will have a proper test suite etc. |
Ok, thanks. This version works for at least these two test inputs:
|
Yes please! |
FYI the function above mishandles
In fact a number of the examples from http://unicode.org/cldr/utility/idna.jsp don't work. |
No, |
Oh. Well, chromium does |
Yeah I know, bugs have been filed. |
…2008 eg https://todayinmarch2020.🦈🖥.ws/ , https://🕸💍.ws/ , https://🐷🔥.ws https://unicode.org/faq/idn.html#6 psf/requests#3687 kjd/idna#18 kjd/idna#40
The text was updated successfully, but these errors were encountered: