Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: Windows filenames with unpaired surrogates are not handled correctly #32334

Closed
hundt opened this issue May 30, 2019 · 7 comments
Closed
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@hundt
Copy link

hundt commented May 30, 2019

There is a well-known issue with Windows/NTFS (see rust-lang/rust#12056 and https://lwn.net/Articles/684181/) where filenames are treated as UTF-16 but are allowed to contain unpaired surrogates. But syscall_windows.go assumes that the input and output to the Windows syscalls is valid UTF-16. This breaks some of the high-level APIs; for example, File.Readdir on a directory containing files with unpaired surrogates in the names will return FileInfo results with incorrect names (valid filenames but referring to different or nonexistent files). A demonstration is included below.

I'm not sure what a reasonable solution would be. I guess essentially something like WTF-8 where the strings that come back from these syscalls on Windows are generally valid UTF-8 but might not be?

I'm not a Windows developer so I'm not sure how often this issue comes up in real life, but I happened to notice it so I thought I'd flag it in case anyone finds it worth taking action, or so people can find this documentation of the issue if they encounter it.

What version of Go are you using (go version)?

$ go version
go version go1.12.5 windows/386

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=386
set GOBIN=
set GOCACHE=C:\Users\IEUser\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=386
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\IEUser\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_386
set GCCGO=gccgo
set GO386=sse2
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m32 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessag
e-length=0 -fdebug-prefix-map=C:\Users\IEUser\AppData\Local\Temp\go-build4397225
02=/tmp/go-build -gno-record-gcc-switches

What did you do?

Created a file named <unpaired surrogate>.txt ([]uint16=[0xdcc0 0x2e 0x74 0x78 0x74]) and attempt to read it by calling ioutil.ReadDir and reading all the files that come back.

Code snippet
package main

import "log"
import "syscall"
import "io/ioutil"
import "os"

// Create a file at c:\test\files\<w>.txt
func write(w uint16, content []byte) {
	pathb := []byte("c:\\test\\files\\")
	path := []uint16{}
	for _, c := range pathb {
		path = append(path, uint16(c))
	}
	fileName := []uint16{w, '.', 't', 'x', 't'}
	log.Printf("  Converted filename: %#x", fileName)
	path = append(path, fileName...)
	pathp := &path[0]
	access := uint32(syscall.GENERIC_READ | syscall.GENERIC_WRITE)
	sharemode := uint32(syscall.FILE_SHARE_READ | syscall.FILE_SHARE_WRITE)
	createmode := uint32(syscall.CREATE_ALWAYS)
	h, e := syscall.CreateFile(pathp, access, sharemode, nil, createmode, syscall.FILE_ATTRIBUTE_NORMAL, 0)
	if e != nil {
		log.Printf("  CreateFile failed: %s", e)
	} else {
		n, e := syscall.Write(h, content)
		if n != len(content) || e != nil {
			log.Printf("  Write failed: n=%d e=%s", n, e)
		}
	}
	syscall.Close(h)
}

func main() {
	testCases := []uint16{'a', 0xdc00}

	for _, w := range testCases {
		b := []byte{byte(w)}
		if w > 0xff {
			b = []byte{byte(w >> 8), byte(w)}
		}
		fileName := string(b) + ".txt"
		log.Printf("Writing %#x to %#x", b, fileName)
		write(w, b)
	}

	log.Printf("Reading from directory...")
	files, err := ioutil.ReadDir("c:\\test\\files\\")
	if err != nil {
		log.Fatal(err)
	}

	for _, file := range files {
		filePath := "c:\\test\\files\\" + file.Name()
		log.Printf("  Filename string: %#x", []byte(file.Name()))
		f, err := os.Open(filePath)
		if err != nil {
			log.Printf("    Error opening: %s", err)
		} else {
			buf, err := ioutil.ReadAll(f)
			if err != nil {
				log.Printf("    error reading: %s", err)
			} else {
				log.Printf("    successfully opened; content=%#x", buf)
			}
			f.Close()
		}
	}
}

What did you expect to see?

The code successfully opens the file.

What did you see instead?

The code attempts to open a file with name [0xfffd 0x2e 0x74 0x78 0x74] instead.

With the test cases above it fails with an error:

c:\test>go run files.go
2019/05/30 11:14:14 Writing 0x61 to 0x612e747874
2019/05/30 11:14:14   Converted filename: [0x61 0x2e 0x74 0x78 0x74]
2019/05/30 11:14:14 Writing 0xdc00 to 0xdc002e747874
2019/05/30 11:14:14   Converted filename: [0xdc00 0x2e 0x74 0x78 0x74]
2019/05/30 11:14:14 Reading from directory...
2019/05/30 11:14:14   Filename string: 0x612e747874
2019/05/30 11:14:14     successfully opened; content=0x61
2019/05/30 11:14:14   Filename string: 0xefbfbd2e747874
2019/05/30 11:14:14     Error opening: open c:\test\files\?.txt: The system cannot find the file specified.

If you add 0xfffd to testCases (to create the "replacement character" file that it is looking for) it will actually open that same file twice:

c:\test>go run files.go
2019/05/30 11:16:21 Writing 0x61 to 0x612e747874
2019/05/30 11:16:21   Converted filename: [0x61 0x2e 0x74 0x78 0x74]
2019/05/30 11:16:21 Writing 0xdc00 to 0xdc002e747874
2019/05/30 11:16:21   Converted filename: [0xdc00 0x2e 0x74 0x78 0x74]
2019/05/30 11:16:21 Writing 0xfffd to 0xfffd2e747874
2019/05/30 11:16:21   Converted filename: [0xfffd 0x2e 0x74 0x78 0x74]
2019/05/30 11:16:21 Reading from directory...
2019/05/30 11:16:21   Filename string: 0x612e747874
2019/05/30 11:16:21     successfully opened; content=0x61
2019/05/30 11:16:21   Filename string: 0xefbfbd2e747874
2019/05/30 11:16:21     successfully opened; content=0xfffd
2019/05/30 11:16:21   Filename string: 0xefbfbd2e747874
2019/05/30 11:16:21     successfully opened; content=0xfffd
@mdempsky mdempsky changed the title Windows filenames with unpaired surrogates are not handled correctly syscall: Windows filenames with unpaired surrogates are not handled correctly May 30, 2019
@mdempsky
Copy link
Contributor

/cc @alexbrainman

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 1, 2019
@alexbrainman
Copy link
Member

@hundt thank you very much for creating this issue.

I can reproduce your problem here on my Windows 10. I also tried opening file with 'corrupted' file name with notepead++, and notepead++ can read and write the file. Mind you notepead++ uses standard system 'Open' dialogue to get the filename.

So, I agree. Go should, probably, deal with these file names somehow.

Unfortunately I don't have time to deal with this issue. So leaving for others.

Alex

@hundt
Copy link
Author

hundt commented Jul 30, 2019

I lack the ability to tag issues, but I wonder if this issue has security implications. I am thinking of things like:

  • a server validates a username by reading a directory listing, and an invalid username containing the unicode replacement character is accepted because of the presence of a valid username with an unpaired surrogate in its place
  • a server writes a file to some location based on username, and multiple different usernames with unpaired surrogates end up overwriting or gaining access to each other's data

I am not a security expert so maybe these scenarios are far-fetched or not that important.

@networkimprov
Copy link

@gopherbot add "help wanted"

@mattn
Copy link
Member

mattn commented Jan 24, 2020

This is caused by that Go replace invalid sequences to 0xFFFD with []rune(s) in syscall.UTF8String.

p := []byte("\xED\xB0\x80")
fmt.Printf("%x\n", []rune(string(p)))

https://play.golang.org/p/IonQ_Sk2U8n

This is right behavior in UTF-8. 0xED indicate range 0x80-0x9F for next byte. But 0xB0 is out of the range. If utf16.Decode accept this invalid range for second byte (by any chance?), 0x80 is in range of 0x80-0xBF, then codepoint should be:

(0xED & 0x0F) << 12 | (0xB0 & 0x3F) << 6 | (0x80 & 0x3F) << 3 = 0xDC00

os.Open will be possible to open the file. But we know this possibly makes security issue.

@johnwcowan
Copy link

Using U+FFFD in this context will break the assumption,, which is universal, that in a given directory there cannot be two files with the same name. If you have files whose names are 0xD800 and 0xD801, they will both be returned as "\UFFFD".

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
@qmuntal
Copy link
Contributor

qmuntal commented May 16, 2023

go1.21 will support Windows filenames with unpaired surrogated by using WTF-8. Done in CL 493036 to fix #59971.

@qmuntal qmuntal closed this as completed May 16, 2023
@golang golang locked and limited conversation to collaborators May 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

10 participants