runtime: performance problem with many Cgo calls

Please answer these questions before submitting your issue. Thanks!

### What version of Go are you using (`go version`)?

`go version go1.8 darwin/amd64`, though I've also verified this happens on Linux.

### What operating system and processor architecture are you using (`go env`)?

```
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pmattis/Development/go"
GORACE=""
GOROOT="/Users/pmattis/Development/go-1.8"
GOTOOLDIR="/Users/pmattis/Development/go-1.8/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/fpqpgdqd167c70dtc6840xxh0000gn/T/go-build085228252=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
```

### Background

CockroachDB internally uses RocksDB (a fork of LevelDB) for data storage. We access RocksDB via a small Cgo wrapper. All writes to RocksDB are performed using `WriteBatches` which are first written to a write-ahead-log and then inserted into the database. Internally, RocksDB "groups" multiple concurrent write batches together into a single append to the write-ahead-log. While investigating performance, we noticed that the size of the groups was smaller than expected. 

### What did you do?

TL;DR? If write batch grouping is performed in Go performance is good. If it is performed in C++ (either by code that is nearly identical to the Go grouping or by RocksDB) performance is bad. 

https://github.com/cockroachdb/cockroach/pull/14138 started as an experiment to replace the grouping of write batches in RocksDB with our own grouping in Go to try and understand the smaller than expected group sizes. In addition to fixing the group sizes, it improved performance on one experiment by 100% while reducing latencies. This was unexpected as the Cgo call overheads are negligible in comparison to the cost of committing the batch to RocksDB. In order to point the finger at RocksDB, we reimplemented the grouping of write batches in our Cgo RocksDB wrapper. Performance was equivalent to RocksDB performing the grouping. 

I've provided a stripped down reproduction scenario at https://github.com/petermattis/batchtest. Running on my Mac laptop I see:

```
~ ./batchtest -t go
_elapsed____ops/sec
      1s     8029.6
      2s     8432.0
      3s     8100.1
      4s     8296.5
      5s     8171.8
      6s     8282.0
      7s     8040.7
      8s     8133.3
      9s     8240.4
     10s     8221.7
```

```
~ ./batchtest -t cgo
_elapsed____ops/sec
      1s     5036.4
      2s     2242.4
      3s     1284.3
      4s     1245.5
      5s     1254.5
      6s     1246.1
      7s     1962.5
      8s     4291.7
     10s     2036.3
```

By default, `batchtest` uses 100 concurrent worker threads writing "batches". My suspicion is that `batchtest` and CockroachDB are tickling some badness in the Go scheduler. If I set `GOMAXPROCS` to the number of concurrent workers the cgo performance gets much closer to the Go performance:

```
~ GOMAXPROCS=100 ./batchtest -t cgo
_elapsed____ops/sec
      1s     6943.9
      2s     7252.0
      3s     7090.4
      4s     6810.8
      5s     7326.4
      6s     7758.9
      7s     7897.7
      8s     7893.2
      9s     7022.5
     10s     6875.9
```

https://github.com/cockroachdb/cockroach/pull/14138 is an acceptable workaround for committing batches, but it would be great to understand the performance discrepancy here. We're concerned about other Cgo operations in CockroachDB that don't have such easy workarounds. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: performance problem with many Cgo calls #19574

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

Background

What did you do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

runtime: performance problem with many Cgo calls #19574

Description

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

Background

What did you do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?