-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: investigate linux-race failures in GitHub actions #2159
Comments
The `linux-race` job is currently consistently failing in CI (both pre- and post-merge). This is being tracked in cockroachdb#2159. Temporarily skip the job while the cause is investigated. Touches cockroachdb#2159.
I did some searching, and it looks like there are some other projects that run into the same issues with jobs that eat up a lot of resources. I'm going to open a support case with GH and see if they can poke around and see if the VMs running our jobs are being killed for the same reasons. |
The memory usage for the Running: $ go test -tags 'invariants' -race -timeout 20m -count 1 -run . ./sstable Current: $ while $(pgrep sstable > /dev/null); do pmap -x $(pgrep sstable) | grep total; sleep 1; done
total kB 1561548 48580 47556
total kB 12747176 57172 56084
total kB 34971432 5513232 5512144
total kB 46083048 10562376 10561288
total kB 46092112 7531108 7530020
total kB 46104464 7953620 7942876
total kB 1899826908 7649968 7639160
total kB 1899826404 7836268 7825396
total kB 1899828180 8066388 8055516
... Before commit: $ while $(pgrep sstable > /dev/null); do pmap -x $(pgrep sstable) | grep total; sleep 1; done
total kB 1637896 78916 77808
total kB 2090584 218160 217052
total kB 2536000 236340 235232
total kB 1881287484 154604 153432
total kB 1881302720 221436 210664
total kB 1881287740 196204 185432
total kB 1881287740 157572 146800
... |
Narrowing in on the issue, it looks like On $ go test -tags '' -race -timeout 20m -count 1 -run TestReader$ -v ./sstable | grep -E '^--- PASS'
--- PASS: TestReader (4.62s) $ while $(pgrep sstable > /dev/null); do pmap -x $(pgrep sstable) | grep total; sleep 0.5; done
total kB 1705820 47684 46448
total kB 1853284 48480 37588
total kB 13039160 2255992 2245100
total kB 35262392 3500896 3490004
total kB 35262392 8288664 8277772
total kB 35262392 6127036 6116144
total kB 57485624 3711220 3700264
total kB 57485624 9722268 9711312 Running the same test without the new format version, the memory usage is much better, and the test completes in a fraction of the time: diff --git a/sstable/reader_test.go b/sstable/reader_test.go
index c94e1fdf..a768cce1 100644
--- a/sstable/reader_test.go
+++ b/sstable/reader_test.go
@@ -204,7 +204,7 @@ func TestReader(t *testing.T) {
"prefixFilter": "testdata/prefixreader",
}
- for _, format := range []TableFormat{TableFormatPebblev2, TableFormatPebblev3} {
+ for _, format := range []TableFormat{TableFormatPebblev2} {
for dName, blockSize := range blockSizes {
for iName, indexBlockSize := range blockSizes {
for lName, tableOpt := range writerOpts { $ go test -tags '' -race -timeout 20m -count 1 -run TestReader$ -v ./sstable | grep -E '^--- PASS'
--- PASS: TestReader (0.94s) $ while $(pgrep sstable > /dev/null); do pmap -x $(pgrep sstable) | grep total; sleep 0.5; done
total kB 1704404 44844 43672
total kB 1778144 46276 45104 Note that I'm also running with @sumeerbhola - mind if I send this your way to take a look? |
I poked this some more with github.com/cockroachdb/pebble/sstable.(*valueBlockWriter).addValue
/home/nickt/Development/pebble/sstable/value_block.go
Total: 3GB 3GB (flat, cum) 99.85%
463 . . if cap(w.buf.b) < blockLen {
464 . . size := w.blockSize + w.blockSize/2
465 . . if size < blockLen {
466 . . size = blockLen + blockLen/2
467 . . }
468 3GB 3GB buf := make([]byte, blockLen, size)
469 . . _ = copy(buf, w.buf.b)
470 . . w.buf.b = buf
471 . . } else {
472 . . w.buf.b = w.buf.b[:blockLen]
473 . . } @sumeerbhola - is there anything for us to dig into here around pooling some of these byte slices or making use of some of the sugar we have in |
Thanks for investigating! I will send out a fix soon. |
Tests can set a high maximum block size e.g. TestReader sets the block size to 2GB. This resulted in a 3GB byte slice being allocated. The new logic is similar to the behavior in blockWriter. Fixes cockroachdb#2159
Tests can set a high maximum block size e.g. TestReader sets the block size to 2GB. This resulted in a 3GB byte slice being allocated. The new logic is similar to the behavior in blockWriter. Fixes #2159
Currently, the
linux-race
job is consistently failing in GitHub actions (with exit code 143 - SIGTERM).For example, a CI run with verbose logging enabled fails with the following:
Initial indications point at 7d9a5b2, though the same failure mode has been observed on branches without this commit (see here).
Pure speculation: we're crossing some kind of GH Action resource limit when running
race
, which results in the platform sending a SIGTERM to themake
process, which fails the test.See this internal thread for more context.
The text was updated successfully, but these errors were encountered: