Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

Create a separate CI job for race detection #741

Closed
gbalint opened this issue Jun 25, 2018 · 4 comments · Fixed by ethereum/go-ethereum#19148
Closed

Create a separate CI job for race detection #741

gbalint opened this issue Jun 25, 2018 · 4 comments · Fixed by ethereum/go-ethereum#19148

Comments

@gbalint
Copy link

gbalint commented Jun 25, 2018

We want to keep Swarm's code race free. For that we want to run race detection at least every day on Swarm's code. Go's race detection is nice and neat, but it can increase the test run time significantly (~10x). => We need a separate job for go test -race.

Note: this assumes the code base is race free so we don't just spam people with an always failing job.

Questions:

  • Could we use Travis for the task?
  • How long does it take to run only Swarm tests with race detection and without?

Note: If the run time is sufficiently low we might want to run race detection for every commit.

Later we might want to run long running tests and benchmarks with the same methods.
(benchmarks: How to keep the tests self-checking?)

@gbalint gbalint added the test label Jun 25, 2018
@acud acud self-assigned this Oct 4, 2018
@frncmx frncmx unassigned acud Jan 7, 2019
@frncmx frncmx changed the title Run longrunning and benchmark tests somehow Create a CI job for nightly runs Jan 11, 2019
@frncmx frncmx changed the title Create a CI job for nightly runs Create a separate CI job for race detection Jan 11, 2019
@frncmx
Copy link
Contributor

frncmx commented Jan 11, 2019

On Dell XPS 13 9350 (Intel i7-6560U CPU @ 2.20GHz, 16GB RAM)

Without Race Detection

time go test -count 1 github.com/ethereum/go-ethereum/swarm...
[...] PASS
real	1m9.517s
user	2m6.279s
sys	0m16.929s

Note: 40% of the time is spent with github.com/ethereum/go-ethereum/swarm/network/stream (49.277s)

With Race Detection

time go test -count 1 -race github.com/ethereum/go-ethereum/swarm...
[...] FAIL
real	11m6.384s
user	20m50.035s
sys	2m3.987s

@frncmx frncmx self-assigned this Jan 18, 2019
frncmx pushed a commit that referenced this issue Jan 22, 2019
Test fails unreliably when run with `-race` flag. As waitGC() does not
work as expected, it might just continue (not block) if GC is not
running already. We might end up with or without a chunk, depending
on goroutine scheduling.

As localstore rewrite is at the finish line and this blocks
#741 => Skip.
@frncmx
Copy link
Contributor

frncmx commented Feb 5, 2019

By adding the following code snippet to our .travis.yml it's possible to have a separate job to run race detection on only Swarm code base, as a daily Cron job.

Note: A Cron job has to be manually added on Travis UI in the settings menu.

    - name: Race Detector for Swarm # helps the job to stand out
      if: type = cron AND branch = master AND repo = ethersphere/go-ethereum
      os: linux # this is actually the default, but every job mentions
      dist: trusty # this is actually the default, but every job mentions
      go: 1.11.x
      git:
        submodules: false # avoid cloning ethereum/tests
      script: go test -v -race ./swarm...

However the job hits Travis' memory limitations. As the race detector can increase the memory consumption by 10x.

Questions

  • Would it be possible for us to fit into 7.5 gigs of memory? Travis env specs
  • Is the mem cap/job? => If yes, splitting the race detection into a matrix jobs can help. Otherwise using stages is possible, but I think that would come with a huge change on .travis.yml.

@frncmx
Copy link
Contributor

frncmx commented Feb 5, 2019

TestOverlaySim tries to start 16 nodes. Is that too much for Travis as the test always fails the same way?

=== RUN   TestOverlaySim
panic: Could not startup node network for mocker
goroutine 14 [running]:
github.com/ethereum/go-ethereum/p2p/simulations.startStop(0xc00020af00, 0xc000043620, 0x10)
	/home/travis/gopath/src/github.com/ethereum/go-ethereum/p2p/simulations/mocker.go:66 +0x8a0
created by github.com/ethereum/go-ethereum/p2p/simulations.(*Server).StartMocker
	/home/travis/gopath/src/github.com/ethereum/go-ethereum/p2p/simulations/http.go:348 +0x260

@frncmx
Copy link
Contributor

frncmx commented Feb 8, 2019

TestOverlaySim was fixed by reducing the number of nodes.
Other memory issues were solved by ethereum/go-ethereum#19004

See #1205 for the remaining issues.

frncmx pushed a commit that referenced this issue Feb 20, 2019
We had more than a dozen of Data Races in our code base. Go has
a nice builtin race detector tool. So we should have a CI job
that at least notifies us timely if we introduce a new race.

1st let's just have a daily CRON job triggered on
ethersphere/go-ethereum. Later as the job gets stable, we can start
using it for ethersphere PRs.

resolves: #741
@adamschmideg adamschmideg added this to the 0.3.12 milestone Feb 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants