runtime: unexpectedly large slowdown with runtime.LockOSThread #18023

ianlancetaylor · 2016-11-22T20:45:50Z

The benchmark https://play.golang.org/p/nI4LO1wu17 shows a significant slowdown when using runtime.LockOSThread.

On amd64 at current tip I see this output from go test -test.cpu=1,2,4,8 -test.bench=.:

BenchmarkUnlocked     	 3000000	       500 ns/op
BenchmarkUnlocked-2   	 3000000	       638 ns/op
BenchmarkUnlocked-4   	 2000000	       749 ns/op
BenchmarkUnlocked-8   	 2000000	       911 ns/op
BenchmarkLocked       	  200000	      7887 ns/op
BenchmarkLocked-2     	  200000	      8048 ns/op
BenchmarkLocked-4     	  200000	      8235 ns/op
BenchmarkLocked-8     	  200000	      7882 ns/op

The problem may be entirely due to the extra thread context switches required by runtime.LockOSThread. When the thread is not locked, the channel communication may be done in practice by a simple goroutine switch. When using runtime.LockOSThread a full thread context switch is required.

Still, we should make sure we have the tools to confirm that.

The text was updated successfully, but these errors were encountered:

rsc · 2016-11-22T21:29:52Z

The golang-nuts thread about this says that using 'perf stat' shows approximately 1000X more thread context switches with LockOSThread than without. Is that enough confirmation?

ianlancetaylor · 2016-11-22T21:47:31Z

Somehow I don't see that, but, yes, there is probably nothing that can be done here. Sorry for the noise.

navytux · 2017-09-08T19:33:15Z

Let me chime in a bit. On Linux the context switch can happen, if my reading of futex_wake() is correct (which is probably not), because e.g. wake_up_q() via calling wake_up_process() -> try_to_wake_up() -> select_task_rq() can select another cpu

                cpu = cpumask_any(&p->cpus_allowed);

for woken process.

The Go runtime calls futex_wake() in notewakeup() to wake up an M that was previously stopped via stopm() -> notesleep() (the latter calls futexwait()).

When LockOSThread is used an M is dedicated to G so when that G blocks, e.g. on chan send, that M, if I undestand correctly, has high chances to stop. And if it stops it goes to futexwait and then context switch happens when someone wakes it up because e.g. something was sent to the G via channel.

With this thinking the following patch:

diff --git a/src/runtime/lock_futex.go b/src/runtime/lock_futex.go
index 9d55bd129c..418fe1b845 100644
--- a/src/runtime/lock_futex.go
+++ b/src/runtime/lock_futex.go
@@ -146,7 +157,13 @@ func notesleep(n *note) {
                // Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
                ns = 10e6
        }
-       for atomic.Load(key32(&n.key)) == 0 {
+       for spin := 0; atomic.Load(key32(&n.key)) == 0; spin++ {
+               // spin a bit hoping we'll get wakup soon
+               if spin < 10000 {
+                       continue
+               }
+
+               // no luck -> go to sleep heavily to kernel
                gp.m.blocked = true
                futexsleep(key32(&n.key), 0, ns)
                if *cgo_yield != nil {

makes BenchmarkLocked much faster on my computer:

name        old time/op  new time/op  delta
Unlocked-4   485ns ± 0%   483ns ± 1%     ~     (p=0.188 n=9+10)
Locked-4    5.22µs ± 1%  1.32µs ± 5%  -74.64%  (p=0.000 n=9+10)

I also looked around and found: essentially at every CGo call lockOSThread is used:

https://github.com/golang/go/blob/ab401077/src/runtime/cgocall.go#L107

With this in mind I modified the benchmark a bit so that no LockOSThread is explicitly used, but server performs 1 and 10 simple C calls for every request:

CGo-4        581ns ± 1%   556ns ± 0%   -4.27%  (p=0.000 n=10+10)
CGo10-4     2.20µs ± 6%  1.23µs ± 0%  -44.32%  (p=0.000 n=10+9)

which shows the change brings quite visible speedup.

This way I'm not saying my patch is right, but at least it shows that much can be improved. So I suggest to reopen the issue.

Thanks beforehand,
Kirill

/cc @dvyukov, @aclements, @bcmills

full benchmark source:

(tmp_test.go)

package tmp

import (
        "runtime"
        "testing"
)

type in struct {
        c   chan *out
        arg int
}

type out struct {
        ret int
}

func client(c chan *in, arg int) int {
        rc := make(chan *out)
        c <- &in{
                c:   rc,
                arg: arg,
        }
        ret := <-rc
        return ret.ret
}

func _server(c chan *in, argadjust func(int) int) {
        for r := range c {
                r.c <- &out{ret: argadjust(r.arg)}
        }
}

func server(c chan *in) {
        _server(c, func(arg int) int {
                return 3 + arg
        })
}

func lockedServer(c chan *in) {
        runtime.LockOSThread()
        server(c)
        runtime.UnlockOSThread()
}

// server with 1 C call per request
func cserver(c chan *in) {
        _server(c, cargadjust)
}

// server with 10 C calls per request
func cserver10(c chan *in) {
        _server(c, func(arg int) int {
                for i := 0; i < 10; i++ {
                        arg = cargadjust(arg)
                }
                return arg
        })
}

func benchmark(b *testing.B, srv func(chan *in)) {
        inc := make(chan *in)
        go srv(inc)
        for i := 0; i < b.N; i++ {
                client(inc, i)
        }
        close(inc)
}

func BenchmarkUnlocked(b *testing.B)    { benchmark(b, server) }
func BenchmarkLocked(b *testing.B)      { benchmark(b, lockedServer) }
func BenchmarkCGo(b *testing.B)         { benchmark(b, cserver) }
func BenchmarkCGo10(b *testing.B)       { benchmark(b, cserver10) }

(tmp.go)

package tmp

// int argadjust(int arg) { return 3 + arg; }
import "C"

// XXX here because cannot use C in tests directly
func cargadjust(arg int) int {
        return int(C.argadjust(C.int(arg)))
}

ianlancetaylor · 2017-09-08T21:42:02Z

@navytux This issue is closed. If you want to discuss your patch, please open a new issue or raise a topic on the golang-dev mailing list. Thanks.

navytux · 2017-09-10T13:01:52Z

@ianlancetaylor thanks for feedback. I've opened #21827 and posted update to original thread on golang-nuts.

ianlancetaylor added this to the Go1.9 milestone Nov 22, 2016

ianlancetaylor closed this as completed Nov 22, 2016

navytux mentioned this issue Sep 10, 2017

runtime: big performance penalty with runtime.LockOSThread #21827

Open

golang locked and limited conversation to collaborators Sep 10, 2018

gopherbot added the FrozenDueToAge label Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: unexpectedly large slowdown with runtime.LockOSThread #18023

runtime: unexpectedly large slowdown with runtime.LockOSThread #18023

ianlancetaylor commented Nov 22, 2016

rsc commented Nov 22, 2016

ianlancetaylor commented Nov 22, 2016

navytux commented Sep 8, 2017

ianlancetaylor commented Sep 8, 2017

navytux commented Sep 10, 2017

runtime: unexpectedly large slowdown with runtime.LockOSThread #18023

runtime: unexpectedly large slowdown with runtime.LockOSThread #18023

Comments

ianlancetaylor commented Nov 22, 2016

rsc commented Nov 22, 2016

ianlancetaylor commented Nov 22, 2016

navytux commented Sep 8, 2017

ianlancetaylor commented Sep 8, 2017

navytux commented Sep 10, 2017