Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth: fix race condition in eth/sync tests #522

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

meowsbits
Copy link
Contributor

@meowsbits meowsbits commented Jan 30, 2023

Fixes #521

The race was discoverable with
go test -race ./eth
or under 'slow' machine conditions, eg. GOMAXPROCS=1 and/or a 'slower' machine.

The race was around the minArtificialFinalityPeers value, which was both

  • assigned in (during) the tests, and
  • checked in the nextSyncOp function

and the order of those uses was nondeterministic.

This patch resolves the issue by moving
the adhoc and temporary re-assignment of that
package-wide value to the start of the tests,
before any go routines for peer handling
or sync protocol are fired off.

Note that the real-clock timeouts in these
tests can still (invariantly) cause spurious test failures
under 'slow' conditions (see their 250ms sleeps).

This invariant limitation (seen in the upstream-identical tests TestSnapSyncDisabling6[6|7]) can be reproduced on my computer with:

for i in 1 0
do env GOMAXPROCS=$i go test -count 1 -run SnapSyncDisabling ./eth >/dev/null 2>&1 && echo ok || echo failed
done
failed
ok

Reviewers can test the efficacy of this patch with

$ go test -race ./eth
ok      github.com/ethereum/go-ethereum/eth     38.158s

@meowsbits meowsbits requested a review from ziogaschr January 30, 2023 21:50
@meowsbits meowsbits changed the title eth: fix race condition in tests eth: fix race condition in eth/sync tests Jan 30, 2023
@ziogaschr
Copy link
Member

I tested this in a super slow machine twice. First time I got a race but test finished. Second time I got a race and test hasn't finished because of timeout after 10 minutes.

WARNING: DATA RACE
Write at 0x00c0005941d0 by goroutine 119:
  github.com/ethereum/go-ethereum/eth.TestArtificialFinalityFeatureEnablingDisabling_StaleHead()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync_test.go:396 +0xa00
  testing.tRunner()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1439 +0x18c
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1486 +0x44

Previous read at 0x00c0005941d0 by goroutine 173:
  github.com/ethereum/go-ethereum/eth.(*chainSyncer).nextSyncOp()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync.go:213 +0xe4
  github.com/ethereum/go-ethereum/eth.(*chainSyncer).loop()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync.go:160 +0x2f0
  github.com/ethereum/go-ethereum/eth.(*handler).Start.func3()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/handler.go:559 +0x38

Goroutine 119 (running) created at:
  testing.(*T).Run()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1486 +0x560
  testing.runTests.func1()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1839 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1439 +0x18c
  testing.runTests()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1837 +0x6c8
  testing.(*M).Run()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1719 +0x878
  main.main()
      _testmain.go:101 +0x2fc

Goroutine 173 (running) created at:
  github.com/ethereum/go-ethereum/eth.(*handler).Start()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/handler.go:559 +0x304
  github.com/ethereum/go-ethereum/eth.newTestHandlerWithBlocksWithOpts()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync_test.go:83 +0x670
  github.com/ethereum/go-ethereum/eth.TestArtificialFinalityFeatureEnablingDisabling_StaleHead()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync_test.go:368 +0x140
  testing.tRunner()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1439 +0x18c
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1486 +0x44

@meowsbits
Copy link
Contributor Author

Hm, thanks for the report. I'll come back with another.

The race was discoverable with
go test -race ./eth
or under 'slow' machine conditions, eg. GOMAXPROCS=1
and/or a 'slower' machine.

The race was around the minArtificialFinalityPeers
value, which was both
- assigned in (during) the tests, and
- checked in the nextSyncOp function

and the order of those uses was nondeterministic.

This patch resolves the issue by moving
the adhoc and temporary re-assignment of that
package-wide value to the start of the tests,
before any go routines for peer handling
or sync protocol are fired off.

Note that the real-clock timeouts in these
tests can still cause spurious test failures
under 'slow' conditions (see their 250ms sleeps).

Fixes #521

Date: 2023-01-30 13:39:38-08:00
Signed-off-by: meows <b5c6@protonmail.com>
@ziogaschr found a(nother) race condition
in the tests, which I have had a hard time dupliating
with
env GOMAXPROCS=1 go test -count 1 -race ./eth

But this should, hopefully, fix the issue.

Date: 2023-02-07 08:47:44-08:00
Signed-off-by: meows <b5c6@protonmail.com>
@meowsbits meowsbits force-pushed the fix/sync-test-race-af branch from 891b12b to ec625f6 Compare February 7, 2023 17:21
@ziogaschr
Copy link
Member

Test failed with the following race condition:

INFO [02-07|23:29:44.678] Imported new chain segment               blocks=2    txs=0 mgas=0.000 elapsed=13.807ms  mgasps=0.000 number=1024 hash=40efb8..a6dd3b dirty=56.98KiB
INFO [02-07|23:29:44.680] Enabled artificial finality features     reason=synced peers=1
==================
WARNING: DATA RACE
Write at 0x000101d184e0 by goroutine 44:
  github.com/ethereum/go-ethereum/eth.TestArtificialFinalityFeatureEnablingDisabling_AbleDisable()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync_test.go:236 +0xe68
  testing.tRunner()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1439 +0x18c
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1486 +0x44

Previous read at 0x000101d184e0 by goroutine 109:
  github.com/ethereum/go-ethereum/eth.(*chainSyncer).nextSyncOp()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync.go:218 +0x194
  github.com/ethereum/go-ethereum/eth.(*chainSyncer).loop()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync.go:160 +0x2f0
  github.com/ethereum/go-ethereum/eth.(*handler).Start.func3()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/handler.go:559 +0x38

Goroutine 44 (running) created at:
  testing.(*T).Run()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1486 +0x560
  testing.runTests.func1()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1839 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1439 +0x18c
  testing.runTests()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1837 +0x6c8
  testing.(*M).Run()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1719 +0x878
  main.main()
      _testmain.go:101 +0x2fc

Goroutine 109 (running) created at:
  github.com/ethereum/go-ethereum/eth.(*handler).Start()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/handler.go:559 +0x304
  github.com/ethereum/go-ethereum/eth.newTestHandlerWithBlocksWithOpts()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync_test.go:83 +0x670
  github.com/ethereum/go-ethereum/eth.TestArtificialFinalityFeatureEnablingDisabling_AbleDisable()
      /Users/ziogaschr/Sites/go/src/github.com/etclabscore/core-geth/eth/sync_test.go:180 +0xac
  testing.tRunner()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1439 +0x18c
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go@1.18/1.18.8/libexec/src/testing/testing.go:1486 +0x44
==================
WARN [02-07|23:29:44.680] Disabled artificial finality features    reason="low peers" peers=1

The TestArtificialFinality.._xxx tests are racey
because of this value; the tests want to manhandle
it while the handler goroutines (and chainsyncop calls)
are running.
This might fix it, though it feels like overkill.

Date: 2023-02-07 13:59:16-08:00
Signed-off-by: meows <b5c6@protonmail.com>
@meowsbits
Copy link
Contributor Author

I'm still struggling to reproduce the issue, but above is another commit-swing at it. Thanks to @ziogaschr for being my canary in the coal mine.

To test, I'm using:

cpulimit --quiet --limit=50 -- env GOMEMLIMIT=500MiB go test -count 2 -race -run ArtificialFinality -v ./eth |& tee test.out

https://github.com/opsengine/cpulimit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race Condition Testing eth/ package
2 participants