-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
because of idna2008 enforcement some real urls that work in the browser are now broken #3687
Comments
…s that (for better or worse, see https://github.com/kennethreitz/requests/issues/3687)
Thanks for this report! Ultimately, I'm not sure I agree. Fundamentally, those URIs will stop working at some point as browsers move over to IDNA 2008. They will have to move over time in at least some cases, because the Right now I think the real issue is that we attempt to IDNA-encode everything, and probably shouldn't. When a domain that is already IDNA-encoded is passed to us we should probably just leave it alone. The idna project is considering doing the same (see kjd/idna#27), but we can get there ahead of them by saying that for certain URIs we simply short-circuit the encoding. The logic would have to be:
That would, I think, cover this problem. How does that sound? |
That sounds ok. |
I believe we may not need to worry about the case where the URL is a bytestring. @nlevitt were you interested in providing a patch for this? If not, I believe I have a fix ready to go but will gladly defer to you. |
Go for it @nateprewitt |
I've experienced this issue with pylxd using the unix socket interface and requests-unixsocket. In our case, we have to do |
In my case, one of my test url contain upper case letter, it worked with
|
Uppercase letters should be fine, we've enabled a mapping mode that should make it safe. Can you provide the URL that fails? |
after investigate, I notice the subdomain I am testing is like
and the which already be covered in this issue https://github.com/kennethreitz/requests/issues/3683 |
I have the same problem, which I reported in kjd/idna#32, but it seems more to be an issue in requests than in idna. @Lukasa's logic sounds right to me. |
Should be fixed in Requests v2.12.2. |
…2008 eg https://todayinmarch2020.🦈🖥.ws/ , https://🕸💍.ws/ , https://🐷🔥.ws https://unicode.org/faq/idn.html#6 psf/requests#3687 kjd/idna#18 kjd/idna#40
FWIW, I use requests in project(s) where I want to handle web sites like these, eg https://todayinmarch2020.🦈🖥.ws/ , with domains that are valid IDNA2003 but invalid IDNA2008. To do that, I had to add in the try:
resp = requests.get(url, ...)
except requests.exceptions.InvalidURL:
punycode = domain2idna(url)
if punycode != url:
# the domain is valid idna2003 but not idna2008. encode and try again.
resp = requests.get(punycode, ...) I get that these domains may break at some point in the future, but that's a big unknown, and they're registered and serving fine now. I don't have a specific proposal or stronger argument, I just wish I had a less awkward workaround. I have a wrapper around Thanks in advance for listening! |
I ended up doing more research here, and I'm curious about a design decision. Was it a deliberate choice to build in just IDNA2008 and not full Punycode? Or was IDNA2008 evidently doesn't apply to all TLDs. Notably, unlike gTLDs, ccTLDs generally get to choose their own domain policies - background from Wikipedia, ICANN, a GoDaddy representative - and a handful of them have stuck with IDNA2003, UTS#46, or related variants. (Not to mention older proprietary schemes like ThaiURL 😁.) Similarly, afaik domain owners can do whatever they want with their own subdomains. So thanks to Punycode, third level (and beyond) hostnames like https://🌏➡➡❤🔒.ayeshious.com and https://🔒🔒🔒.scotthelme.co.uk are not at risk of breaking due to gTLD regstries enforcing IDNA2008 on pay-level domain registrations. I know you all thought this through back in 2016, eg here and in #3683 (comment), and settled on automatically encoding IDNA2008 and passing through already-encoded hostnames. That seems a bit surprising, since IDNA2008 is only a subset of the currently legal encodings. Mind elaborating on why you didn't either push all encoding onto users, or build in the other legal standard encodings too, notably IDNA2003? (Thanks again for listening!) |
Because of idna2008 enforcement 2.12.0 some real urls that work in the browser are now broken.
For example:
http://☃.net/
http://xn--n3h.net/
My suggestion would be to try idna2008 first, catch exception, then try idna2003.
The text was updated successfully, but these errors were encountered: