Description
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version
)?
go version go1.8 darwin/amd64
, though I've also verified this happens on Linux.
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pmattis/Development/go"
GORACE=""
GOROOT="/Users/pmattis/Development/go-1.8"
GOTOOLDIR="/Users/pmattis/Development/go-1.8/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/fpqpgdqd167c70dtc6840xxh0000gn/T/go-build085228252=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
Background
CockroachDB internally uses RocksDB (a fork of LevelDB) for data storage. We access RocksDB via a small Cgo wrapper. All writes to RocksDB are performed using WriteBatches
which are first written to a write-ahead-log and then inserted into the database. Internally, RocksDB "groups" multiple concurrent write batches together into a single append to the write-ahead-log. While investigating performance, we noticed that the size of the groups was smaller than expected.
What did you do?
TL;DR? If write batch grouping is performed in Go performance is good. If it is performed in C++ (either by code that is nearly identical to the Go grouping or by RocksDB) performance is bad.
cockroachdb/cockroach#14138 started as an experiment to replace the grouping of write batches in RocksDB with our own grouping in Go to try and understand the smaller than expected group sizes. In addition to fixing the group sizes, it improved performance on one experiment by 100% while reducing latencies. This was unexpected as the Cgo call overheads are negligible in comparison to the cost of committing the batch to RocksDB. In order to point the finger at RocksDB, we reimplemented the grouping of write batches in our Cgo RocksDB wrapper. Performance was equivalent to RocksDB performing the grouping.
I've provided a stripped down reproduction scenario at https://github.com/petermattis/batchtest. Running on my Mac laptop I see:
~ ./batchtest -t go
_elapsed____ops/sec
1s 8029.6
2s 8432.0
3s 8100.1
4s 8296.5
5s 8171.8
6s 8282.0
7s 8040.7
8s 8133.3
9s 8240.4
10s 8221.7
~ ./batchtest -t cgo
_elapsed____ops/sec
1s 5036.4
2s 2242.4
3s 1284.3
4s 1245.5
5s 1254.5
6s 1246.1
7s 1962.5
8s 4291.7
10s 2036.3
By default, batchtest
uses 100 concurrent worker threads writing "batches". My suspicion is that batchtest
and CockroachDB are tickling some badness in the Go scheduler. If I set GOMAXPROCS
to the number of concurrent workers the cgo performance gets much closer to the Go performance:
~ GOMAXPROCS=100 ./batchtest -t cgo
_elapsed____ops/sec
1s 6943.9
2s 7252.0
3s 7090.4
4s 6810.8
5s 7326.4
6s 7758.9
7s 7897.7
8s 7893.2
9s 7022.5
10s 6875.9
cockroachdb/cockroach#14138 is an acceptable workaround for committing batches, but it would be great to understand the performance discrepancy here. We're concerned about other Cgo operations in CockroachDB that don't have such easy workarounds.