Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/idna: Display returns invalid label for r4---sn-a5uuxaxjvh-gpm6.googlevideo.com. #27059

Open
soulne4ny opened this issue Aug 17, 2018 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@soulne4ny
Copy link

What version of Go are you using (go version)?

go version go1.10.3 linux/amd64 via docker golang:1.10.3-alpine
(also go version go1.10.3 darwin/amd64)

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build623899479=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Transforming domain name to human-readable form with "golang.org/x/net/idna".

package main

import (
	"fmt"
	"golang.org/x/net/idna"
)

func main() {
	_, e := idna.ToUnicode("r4---sn-a5uuxaxjvh-gpm6.googlevideo.com")
	fmt.Println(e)
	_, e = idna.Display.ToUnicode("r4---sn-a5uuxaxjvh-gpm6.googlevideo.com")
	fmt.Println(e)
}

What did you expect to see?

I expect to see no error.

<nil>
<nil>

What did you see instead?

But idna.Display does not agree to accept the host name as a valid one.

<nil>
idna: invalid label "r4---sn-a5uuxaxjvh-gpm6"

.ToASCII has the same property.

@andybons andybons changed the title golang.org/x/net/idna.Display and r4---sn-a5uuxaxjvh-gpm6.googlevideo.com. x/net/idna: Display returns invalid label for r4---sn-a5uuxaxjvh-gpm6.googlevideo.com. Aug 17, 2018
@andybons
Copy link
Member

@mpvl

@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 17, 2018
@andybons andybons added this to the Unplanned milestone Aug 17, 2018
@mpx
Copy link
Contributor

mpx commented Mar 21, 2021

The Display profile attempts to validate labels, and disallows this particular label. The default Profile works since it does not validate.

FYI, r4---sn-a5uuxaxjvh-gpm6.googlevideo.com is not an XN-label (Punycode), it merely matches "??--" (Reserved LDH label). I'm quite sure this hostname is not intended to be used with IDNA. The label is a valid DNS label (clearly it is used and resolved), but IIUC, it goes against the IDNA RFC:

[non "xn--" R-LDH] labels MUST NOT be processed as ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be mixed with IDNA labels in the same zone.

I suspect returning an error is typically correct, but then all software using ToUnicode probably should handle the error and fall back to displaying the encoded ASCII label as is... which also isn't ideal. I'm sure there are other examples of DNS labels on the internet which match the "??--" prefix but aren't intended to be used with IDNA.

It might be more useful if the default validation ignored non "xn--" R-LDH to prevent software breaking with some of the hostnames used on the internet. It might also help if IDNA is extended in future, but the older software hasn't been rebuilt with support.

@TimothyGu
Copy link
Contributor

This particular bug is previously described in whatwg/url#53 (comment). To fix this, we could use the same fixes as the WHATWG URL Standard, and set CheckHyphens to false (only works after #41732):

var (
	BetterDisplay = idna.New(
		idna.MapForLookup(),
		idna.StrictDomainName(false),
		idna.CheckHyphens(false),
		idna.BidiRule(),
	)
	BetterLookup = idna.New(
		idna.MapForLookup(),
		idna.StrictDomainName(false),
		idna.CheckHyphens(false),
		idna.BidiRule(),
		idna.Transitional(true),
	)
)

For good measure, we also set StrictDomainName (STD 3 domain name rules) to false, as it's a restriction not enforced in web browsers.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/317730 mentions this issue: internal/export/idna: turn off CheckHyphens and STD 3 rules for Display/Lookup

@favonia
Copy link
Contributor

favonia commented Aug 3, 2021

EDIT: I was too naive---the STD3 rules can save us from some trouble but not all. However, this is a real concern and I think some protection is better than none.


I hope we can keep most STD3 domain name rules intact, possibly an option to whitelist the underscore, but not more. At very least, I will feel very uncomfortable if the idna library decides to recommend the disabling of STD3 domain rules when looking up a domain as an internet host name. The STD3 rules not only handle the underscore but also protect us from many well-known attacks based on Unicode normalization. By disabling all of them, the Go library opens a dangerous door to all the following bugs found in other frameworks and browsers (the list was extracted from https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf):

An example would be the URL https://google.com\uFF03@evil.com which could confuse the URL splitter. The problematic character \uFF03 (, FULLWIDTH NUMBER SIGN), which could be mapped to # (depending on the normalization form in use), is disallowed by STD3 (disallowed_STD3_mapped in the latest IDNA table). The core issue is that many Unicode characters (after normalization and mapping) could introduce characters /, ?, #, @, ... that are often significant in a URL. We could potentially have an option to specially allow _ (or other characters that could be normalized and mapped to _) to work with existing hosts, but I do not think we should then also allow /, ?, #, @, ... that pose a security threat.

EDIT: the STD3 check actually could not prevent this attack. A better example would be https://cool.asi℀.evil.com

TL;DR. I hope we do not turn off all STD3 rules in the recommended profiles.

@ianlancetaylor
Copy link
Member

@mpvl Any thoughts on the above? Thanks.

@mpvl
Copy link
Contributor

mpvl commented Nov 2, 2021

@ianlancetaylor
It is unclear to me what the right behavior is here.

But in general, my inclination is to say that if a behavior is in conflict with the specific profile of the specification it is supposed to implement, then it is a bug and the profile should be fixed.

If, however, it is a change to a profile because the spec is undesirable (e.g. some WhatWG variant), then this should be a new profile.

Maybe this does not result in the best API, but it ensures backwards compatibility for software that relies on the official behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants