Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: RemoveAll hangs on Windows when the directory tree contains files with non-UTF8-representable filenames #59971

Closed
TBBle opened this issue May 4, 2023 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@TBBle
Copy link

TBBle commented May 4, 2023

What version of Go are you using (go version)?

> go version
go version go1.19 windows/amd64

Does this issue reproduce with the latest release?

Probably, I haven't tested it personally.

What operating system and processor architecture are you using (go env)?

go env Output
> go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\paulh\AppData\Local\go-build
set GOENV=C:\Users\paulh\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Users\paulh\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\paulh\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.19
set GCCGO=gccgo
set GOAMD64=v1
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=NUL
set GOWORK=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=C:\Users\paulh\AppData\Local\Temp\go-build2524325924=/tmp/go-build -gno-record-gcc-switches

What did you do?

Called os.RemoveAll on a directory tree that contained a file with a name that is valid on Windows NTFS, but not representable in UTF-8.

A source of such files is the MSYS/Cygwin implementation of POSIX unlink in the case where the file is a running executable, e.g., an executable deleting itself or one of its DLLs, such as a pacman self-upgrade.

An example of such a call to os.RemoveAll is cleaning up temporary directories created during export of a Windows Containers filesystem layer.

TODO: Create a minimal repro. An existing unit test demonstrating this is https://github.com/microsoft/go-winio/pull/261/files#diff-fe0a7b5d46479d166aa0b717fbf33f61139084553170813c8d215edd960af249R117-R118

What did you expect to see?

The directory is deleted (or the call fails, I guess) and the code continues executing.

What did you see instead?

The same hang (infinite loop) as #36375, for roughly the same reason.

There's more details in that ticket, and particularly #36375 (comment) which explains how it applies in this case, and links on to the constellation of tickets where this issue has affected users.

In summary:

The difference from #36375 is only that the reason the inner os.RemoveAll returns IsNotExist in that case was passing a too-long DOS-style path rather than an incorrect path. (As it happens, Win32 DeleteFile returns ERROR_FILE_NOT_FOUND when given a too-long path. #36375 was resolved by ensuring filenames are not too long to be passed to the Win32 DeleteFile API, but did not resolve the underlying infinite loop potential.)

I haven't tested this, but I'm pretty sure that a directory with the same kind of unusual name would trigger the same behaviour.

One amusing note is that if you delete the problematic file while the loop is hung, it'll unhang and resume without error.


A few solutions come to mind:

  • Detect when a filename can't be roundtripped via UTF-8 correctly, and report a failure like "Go doesn't support NTFS filenames that can't be represented in UTF-8". This could be applied across the os package's WIndows-specific code, i.e. anywhere we call syscall.UTF16ToString or similar, instead call UTF16ToStringButFailInsteadOfUsingTheReplacementCharacter.
    • This is the least-impactful change on behaviour, as it replaces a hang with an error, so the current range of working and non-working cases is unchanged.
    • This approach means callers will over time need to migrate to something like pkg/fs: Add RemoveAll function microsoft/go-winio#261 as they receive bug reports for such filenames.
    • The number of places we call syscall.UTF16ToString is not small, and x/sys/windows would need to be looked at too.
  • Use WTF-8 as the encoding of strings created when reading or writing Windows directory entries, as this is designed as a UTF-8-like byte-oriented encoding that can represent all NTFS filenames.
    • This is consistent with the non-Windows platforms, which do not assume or check encoding on the byte-arrays in the dirents, but cast them directly to string, and will be least-surprising for users who are already handling the "Not UTF-8 filename" risk on other platforms.
    • This also provides the best chance that a caller of os.Readdir can pass the resulting file names into other os APIs successfully.
    • Same breadth-of-touching as the previous idea, but actually less intrusive logically as we no longer introduce a failure case for filename encoding conversion.
  • A Windows-specific os.RemoveAll implementation (e.g. take pkg/fs: Add RemoveAll function microsoft/go-winio#261 upstream) would probably fix this entire class of errors by keeping the filenames in native format between reading the directory entries and recursing/deleting them. It would also take care of the long file path issue by using the \\?\ prefix naturally, rather than patching it in after-the-fact.
    • This doesn't help other callers, and the Path in any generated PathError would need to be converted to some byte-oriented format, which would still fail if error-handling code attempts to, e.g. os.Lstat it.
  • Detect in os.RemoveAll when the loop is not making forward progress in a way that is robust against why.
    • This fixes only this function, but would also have handled the long-filename case.
    • Again, turns a hang into an error, so minimised impact on behaviour.
    • It'd be a good idea to look into this anyway, but like the first point, it'll just force people to gradually move to external reimplementations that don't fail.
  • Declaring that the Go stdlib only supports UTF-8-representable filenames on Windows, and UTF-8 filenames on other platforms.
    • This provides the most user clarity, and requires changing no code.
    • This would be a nice clarity boots in the docs
    • Allows users to assume strings can be used to manipulate filenames freely.
    • This one seems like an unreasonable limitation on supported environments. an rules out Go for general systems programming.
    • This is here for completeness. Please don't do this one.

My preferred in-Go solution would be encoding all Windows filenames in string in WTF-8 (or a different, covers-all-possible-byte-strings encoding, of course. I'm not particularly tied to WTF-8 itself.)

I particularly dislike the options that push people to external reimplementations, as I'd prefer a solution that made microsoft/go-winio#261 obsolete, not one that makes it widely necessarily, including requiring platform checks on multi-platform codebases.


Just to highlight, this problem only occurs on Windows because it's only on Windows that os is assuming the filenames can be roundtripped via UTF-8.

User code that attempted to implement something like os.RemoveAll or os.Walk by directly calling os.Readdir and also used strings to manipulate the filename strings would be vulnerable to similar issues on all platforms in the presence of non-UTF-8 filenames.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 4, 2023
@bcmills bcmills added this to the Backlog milestone May 4, 2023
@bcmills
Copy link
Contributor

bcmills commented May 4, 2023

I think it would make sense for the syscall.UTF16 functions to convert to and from WTF-8 instead of enforcing UTF-8 validity. I expect that there would be essentially no backward-compatibility issues, since Go programs by and large do not check for UTF-8 validity explicitly and any program that uses replacement characters would end up with system calls that do not round-trip.

For backward compatibility, perhaps we could add a GODEBUG controlling this behavior.

@TBBle, shall we convert this to a proposal for a change in the Windows syscall package?

@qmuntal
Copy link
Contributor

qmuntal commented May 4, 2023

I think it would make sense for the syscall.UTF16 functions to convert to and from WTF-8 instead of enforcing UTF-8 validity.

These seems like the path of less resistance. Yet...

I expect that there would be essentially no backward-compatibility issues, since Go programs by and large do not check for UTF-8 validity explicitly and any program that uses replacement characters would end up with system calls that do not round-trip.

Today it is reasonably safe to asume that a Go string is a well-formed UTF-8. If we change syscall.UTF16 functions to work with WTF-8, then it will be unsafe to write a Go string coming from the os or syscall into an system that expects well-formed UTF-8 strings, as that property will no longer hold.

The WTF-8 talks about this issue in wtf-8/#intended-audience:

Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet.

@bcmills
Copy link
Contributor

bcmills commented May 4, 2023

Today it is reasonably safe to [assume] that a Go string is a well-formed UTF-8.

I don't think that is actually true, though. Go's range loop over a string decodes UTF-8 runes, but nothing else requires or enforces that strings are valid UTF-8.

In particular, the os package today does not enforce UTF-8 validity for platforms other than Windows: https://go.dev/play/p/g02CkiBgUOz

@TBBle
Copy link
Author

TBBle commented May 4, 2023

Today it is reasonably safe to asume that a Go string is a well-formed UTF-8.

7kjq3m

(2003 is RFC 3629, when encoding unpaired surrogates became invalid UTF-8 as part of shrinking Unicode from 31 bits to 21 bits. I've never checked to see if WTF-8 is just reinventing pre-2003 UTF-8, but I'd be mildly amused if that was the case.)

Also, per the documented "For information about UTF-8 strings in Go":

Some people think Go strings are always UTF-8, but they are not: only string literals are UTF-8. As we showed in the previous section, string values can contain arbitrary bytes; as we showed in this one, string literals always contain UTF-8 text as long as they have no byte-level escapes.

To summarize, strings can contain arbitrary bytes, but when constructed from string literals, those bytes are (almost always) UTF-8.

So in Go's case, they apparently intended "string can store arbitrary bytes", not "strings of unknown origin can be safely assumed to be UTF-8".


If we change syscall.UTF16 functions to work with WTF-8, then it will be unsafe to write a Go string coming from the os or syscall into an system that expects well-formed UTF-8 strings, as that property will no longer hold.

I was thinking more of a new set of functions that explicitly convert to and from WTF-8, and using those where the current UTF-16 functions are used for Windows filenames only. This is also how the Windows API works (since Vista); apart from the filesystem APIs (which take raw uint16 arrays), the Win32 API is in UTF-16 (or a "globally"-specified 8-byte encoding for legacy APIs); so the only way for a non-UTF-16-valid filename to exist is by deliberately build those into a uint16 array. (Or read an existing directory entry)

Which is exactly what Cygwin has done, purely for the lolz.

In contrast, on Linux setting LC_CTYPE=ja_JP.SJIS (at the time of file creation, potentially years ago) means it's already unsafe to write a Go string coming from os.Readdir into a system that expects well-formed UTF-8 strings, and there's absolutely nothing in the POSIX API to discourage this. Heck, I downloaded a zip file containing a 2022 government document with SJIS filenames that unzip(1) will happily dump into the filesystem as-is, because today you can't tell it what encoding the filenames are in, nor what encoding you'd like.

Anyway, on-topic, if syscall.UTF16ToString and friends are only used for filenames, then changing their definition will be okay, but still carries risk of mildly surprising users who've called them for other reasons, since they're not documented as filename-specific. Other Win32 API syscall wrappers have possibly used them... (From the name, I assumed they were calling Win32 APIs for conversion, but looking at 1.20.4 they just call out into the utf16 package).

@bcmills
Copy link
Contributor

bcmills commented May 4, 2023

windows.UTF16ToString (and syscall.UTF16ToString) is currently used in only a handful of places, and it has the property that if the UTF-16 buffer is valid UTF-16 then the result is valid UTF-8.

Its specification is:

// UTF16ToString returns the UTF-8 encoding of the UTF-16 sequence s,
// with a terminating NUL and any bytes after the NUL removed.

which to me says that the behavior for a non-UTF-16 sequence is unspecified. So I would argue that changing it to return a WTF-8 string is simply defining behavior that was previously (defined as) a programmer error — that is, the programmer failed to validate the UTF-16 string as required before passing it to UTF16ToString.

Similarly, today UTF16FromString is defined as:

// UTF16FromString returns the UTF-16 encoding of the UTF-8 string
// s, with a terminating NUL added. If s contains a NUL byte at any
// location, it returns (nil, EINVAL).

which does not seem to define any particular behavior for non-UTF-8 strings.

@qmuntal
Copy link
Contributor

qmuntal commented May 5, 2023

The arguments in favor of using WTF-8 in syscall.UTF16 are compelling. I'll try to prepare a CL adding an internal WTF-8 package to get some more knowledge about it.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/493036 mentions this issue: os, syscall: support ill-formed UTF-16 strings on Windows

@qmuntal
Copy link
Contributor

qmuntal commented May 5, 2023

I've submitted CL 493036 with a minimal WTF-8 implementation that has been pluged-into the syscall package. It looks good, although I had to copy a bunch of code from the utf8 package. The issue seems to be solved, though.

@kevpar @TBBle could you try CL 493036 to see if it fixes your cases? You can download it using gotip download 493036.

@TBBle
Copy link
Author

TBBle commented May 6, 2023

Good news. I confirmed a dockerd.exe built with Go 1.20.4 reproduces the hang (and unhangs when I delete the two problematic files), and when built with gotip download 493036's gotip, it no longer hangs.

Hang:

Server: Docker Desktop 4.17.1 (101757)
 Engine:
  Version:          0.0.0-dev
  API version:      1.41 (minimum version 1.24)
  Go version:       go1.20.4
  Git commit:       6051f14291-unsupported
  Built:            05/06/2023 05:02:47
  OS/Arch:          windows/amd64
  Experimental:     true

No hang:

Server: Docker Desktop 4.17.1 (101757)
 Engine:
  Version:          0.0.0-dev
  API version:      1.41 (minimum version 1.24)
  Go version:       devel go1.21-373489978b Fri May 5 18:29:34 2023 +0200
  Git commit:       6051f14291-unsupported
  Built:            05/06/2023 05:00:53
  OS/Arch:          windows/amd64
  Experimental:     true

So the fix works for me. (More details on the test at microsoft/hcsshim#696 (comment) if desired)

Is there any chance this fix will migrate backwards to older Go versions? Docker currently builds with Go 1.18.10 and GO111MODULE=off, and due to legacy build-system shenanigans, it can sometimes be a hassle to upgrade to newer Go releases. (Although I didn't hit any such issues in my limited testing.)

@qmuntal
Copy link
Contributor

qmuntal commented May 9, 2023

Is there any chance this fix will migrate backwards to older Go versions?

I'm not even sure if this will get into go1.21, code freeze is happening in a couple of weeks and this issue haven't got much traction yet.

If we backport something, IMO it shouldn't be CL 493036 but something less invasive, like avoiding the infinite recursion in os.RemoveAll.

@golang/windows @alexbrainman

@TBBle
Copy link
Author

TBBle commented May 9, 2023

That makes sense. And just avoiding the infinite hang (with presumably some kind of error we can recognise) will give callers something to react to now.

That said, perhaps if we want a less-intrusive patch for backporting, having an "unprocessible name" error or something would be nice and explicit; this could perhaps be a round-trip-test immediately after UTF-16 decoding. A "this call got stuck" error might still have other triggers lying around, although I guess the caller would handle it the same way anyway no matter what the cause? (In the case I'm involved with, it's a cleanup that fails after a successful operation, I'm not sure we even check the error).

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/498600 mentions this issue: doc/go1.21: mention os package changes

gopherbot pushed a commit that referenced this issue May 26, 2023
Also mention WTF-8 support in the syscall package.

For #32558
For #58977
For #59971

Change-Id: Id1627889b5e498add498748d9bfc69fb58030b35
Reviewed-on: https://go-review.googlesource.com/c/go/+/498600
Reviewed-by: Eli Bendersky <eliben@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
@golang golang locked and limited conversation to collaborators May 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

4 participants