Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: pair LockOSThread, UnlockOSThread calls #20458

Closed
bcmills opened this issue May 22, 2017 · 12 comments
Closed

proposal: runtime: pair LockOSThread, UnlockOSThread calls #20458

bcmills opened this issue May 22, 2017 · 12 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge Proposal Proposal-Accepted
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented May 22, 2017

#20395 (comment) reminded me once again of the awkwardness of runtime.UnlockOSThread.

As currently documented, UnlockOSThread is not safe for use in any library code: a library that calls LockOSThread has no way to know whether the caller also called LockOSThread (and hence expects the thread to remain locked). In practice, UnlockOSThread is only safe to use within the package that spawned the goroutine to be locked.

UnlockOSThread could be made safe for use in libraries with one simple change: after N calls to LockOSThread, do not actually unlock the thread until the Nth call to UnlockOSThread. This would make the API a bit easier to use correctly and eliminate a potential source of subtle concurrency bugs.

@gopherbot gopherbot added this to the Proposal milestone May 22, 2017
@rsc
Copy link
Contributor

rsc commented May 22, 2017

This seems like a backwards-incompatible change and likely not worth fixing. It's easy to spawn a new goroutine instead.

/cc @aclements @randall77

@rsc rsc changed the title proposal: runtime.UnlockOSThread should be paired with LockOSThread calls proposal: runtime: pair LockOSThread, UnlockOSThread calls May 22, 2017
@bcmills
Copy link
Contributor Author

bcmills commented May 22, 2017

To be clear, I'm more concerned that UnlockOSThread is widely misused than that it is not useful on its own. I think it would probably be fine to deprecate it entirely rather than fixing it, but a quick audit of calls both within Google code and in open-source repositories seems to show this pattern:

func (r *SomeReceiver) SomeCWrapperMethod() {
  runtime.LockOSThread()
  defer runtime.UnlockOSThread()
  …
}

With the current implementation, that's a bug waiting to happen: it assumes that the caller is never running on a locked thread.

Some samples:
https://github.com/StackExchange/wmi/blob/master/wmi.go#L138
https://github.com/Microsoft/go-winio/blob/master/privilege.go#L82
https://github.com/libgit2/git2go/blob/master/walk.go#L41

In contrast, the correct uses are almost always redundant:

go func() {
  runtime.LockOSThread()
  defer runtime.UnlockOSThread()
  someCFunction()
}()

Every use I have seen is either redundant or paired. (And about the only correct, non-redundant paired uses I have been able to find are within the Go runtime itself.)

It's true that the change I propose is not strictly backward-compatible, but as far as I can tell it would bring the standard library in line with the semantics that user code is already written to expect.

@iand
Copy link
Contributor

iand commented May 23, 2017

This seems like something that could be statically checked. Although it is a correctness issue it probably isn't common enough to warrant inclusion in vet. Could be a useful addition to @dominikh's staticcheck though.

@quentinmit
Copy link
Contributor

See also #20395 - one possible solution to both issues is to make Unlock a noop.

@bcmills
Copy link
Contributor Author

bcmills commented May 23, 2017

@iand The property you would want to check statically is that the methods that call runtime.UnlockOSThread are never called from a point at which the thread is already locked. That's easy to verify through chains of static calls, but indirect calls (including variables of func type and interface methods) are tricky — especially when you include the possibility of getting at an interface method using a type-assertion.

@rsc
Copy link
Contributor

rsc commented May 23, 2017

All you need to check is that LockOSThread is only called from within goroutines created by the same package. The examples above are not "safe sometimes". They're just unsafe.

@bcmills
Copy link
Contributor Author

bcmills commented May 25, 2017

If we're going to take the "you're holding it wrong" approach, then yes, we could easily declare that LockOSThread should only be called on a goroutine created in the same package. Without some subtle "noreturn" heuristics, that check would also flag some valid uses, such as locking the current goroutine before terminating the process using tgkill (#19326 (comment)).

Even if we restrict the analysis to only UnlockOSThread, that still invalidates nearly all actual usage of UnlockOSThread outside of the Go runtime itself.

In a strict sense, that would comply with the letter of the Go 1 compatibility policy: the function works as documented, and ~everybody is using it wrong. But the spirit of the Go 1 compatibility policy is that we don't break existing code. It seems to me that if literally everyone but us expects defer runtime.UnlockOSThread() to work in pairs in library methods, we should just make it work.


Some more examples of "unsafe" paired calls:
https://github.com/docker/libnetwork/blob/master/resolver_unix.go#L30
https://github.com/crawshaw/fs/blob/11914e52f0d320746ab59d08f5a07efe038c4a4f/interrupt.go#L30 (The same @crawshaw who is a long-time Go contributor.)
https://github.com/cloudfoundry/guardian/blob/e9346b9849a33cddb037628c79057c4c80d4e3d8/kawasaki/netns/netns_execer_linux.go#L18
https://github.com/mongodb/mongo-tools/blob/488b2cfb9fa309de22fa85b29f77ad09ad4a2f6a/vendor/src/github.com/spacemonkeygo/openssl/ctx.go#L141
https://github.com/mongodb/mongo-tools/blob/57233a4ef048b7f26bd5e4a6f8393160491c4306/vendor/src/github.com/spacemonkeygo/openssl/dhparam.go#L59
https://github.com/JustinRyanH/goboy/blob/e22812bd30606dba3218e90d602cdc872d792d8e/glfw/display.go#L53
https://github.com/fuzzycow/ev32go/blob/79c018bac1be12009692e99d273e634e790b78c4/robotics/chassis/wheeled_chassis.go#L305
https://github.com/gbbr/textmate/blob/b67dc9663138156f4d9268061b5895ac4db975aa/vendor/limetext/gopy/lib/lock.go#L229
https://github.com/garethwarry/com/blob/2b5f154badb07fdcdeb7d1e81c0429360af721b4/com.go#L30
https://github.com/die-net/fotomat/blob/008fd6993024b99525448df9f095994a02d9dbd5/vips/init.go#L28
https://github.com/vishvananda/netlink/blob/master/nl/nl_linux.go#L531

That last package seems to have a lot of clones. Here's one from CoreOS that itself uses paired calls, a likely instance of the very bug I'm describing here:
https://github.com/coreos/mantle/blob/ead6c7fa80be4b7791fa4a4ab70cce359478cd7b/system/ns/enter.go#L31

Here's another noreturn example. The LockOSThread appears to be redundant, since the caller is likely from a C thread:
https://github.com/melange-app/melange/blob/8d794375e263a5caf254b0beef944e2cccfeb90c/backend/notifications/notifications_android.go#L20

I finally found one correct use of UnlockOSThread outside the standard library. It is also in a paired call, and the library it is called in is doing something iffy with threading anyway:
https://github.com/rdarder/goworkers/blob/fe279b6be1e15ebc678e3cc4d3f797da4f8848d1/src/workers/workers.go#L38

@bcmills
Copy link
Contributor Author

bcmills commented Jun 2, 2017

Here's a great illustration of a program that hit this issue in the wild:
https://www.weave.works/blog/linux-namespaces-and-go-don-t-mix

@bboreham
Copy link
Contributor

bboreham commented Jun 3, 2017

@bcmills you have misunderstood that article. The problem there is that another OS thread is started inside the locked region, which inherits the changed namespace, and when other goroutines are then scheduled onto that thread they are in the other namespace.

[edit] that issue is filed as #20676

@rsc
Copy link
Contributor

rsc commented Jun 5, 2017

It's certainly true that there's a lot of incorrect code that this would make correct, much in the same way that the monotonic time change fixed a lot of incorrect timing code.

Suppose we do change this, and we do it on day 1 of the Go 1.10 cycle and find out what breaks. Does that make sense? Objections?

/cc @ianlancetaylor @aclements

@aclements
Copy link
Member

Suppose we do change this, and we do it on day 1 of the Go 1.10 cycle and find out what breaks. Does that make sense? Objections?

We discussed this in the compiler/runtime meeting and decided this seems reasonable. I'll prep a CL.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/45752 mentions this issue.

@rsc rsc modified the milestones: Go1.10, Proposal Jun 19, 2017
@rsc rsc added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jun 19, 2017
tmm1 pushed a commit to fancybits/go that referenced this issue Nov 1, 2017
Currently, there is a single bit for LockOSThread, so two calls to
LockOSThread followed by one call to UnlockOSThread will unlock the
thread. There's evidence (golang#20458) that this is almost never what
people want or expect and it makes these APIs very hard to use
correctly or reliably.

Change this so LockOSThread/UnlockOSThread can be nested and the
calling goroutine will not be unwired until UnlockOSThread has been
called as many times as LockOSThread has. This should fix the vast
majority of incorrect uses while having no effect on the vast majority
of correct uses.

Fixes golang#20458.

Change-Id: I1464e5e9a0ea4208fbb83638ee9847f929a2bacb
Reviewed-on: https://go-review.googlesource.com/45752
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@golang golang locked and limited conversation to collaborators Oct 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge Proposal Proposal-Accepted
Projects
None yet
Development

No branches or pull requests

7 participants