Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: race condition updating last updated scorebook timestamp #7838

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

qu0b
Copy link

@qu0b qu0b commented Oct 25, 2023

In collaboration with antithesis and quantstamp we have setup the bedrock testnet v1.1.1 for testing and found this bug.

Describe the bug
The data race happens in the p2p code, which is generally responsible for distributing unsafe blocks to replicas. There is also a component that scores and bans peers. The record bookkeeping for p2p stats is updated by the p2p library and the op-node seems to only read these records to determine whether peers should be banned. The race specifically happens due to reading the LastUpdate timestamp while it is being updated at the same time.

Expected behavior
No data race

==================
[service_op-node]                  [I] WARNING: DATA RACE
[service_op-node]                  [I] Write at 0x00c000049e40 by goroutine 518:
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*scoreRecord).SetLastUpdated()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/scorebook.go:29 +0xc4
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*recordsBook[...]).SetRecord()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/records_book.go:186 +0x30d
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*scoreBook).SetScore()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/scorebook.go:123 +0xec
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*extendedStore).SetScore()
[service_op-node]                  [I] <autogenerated>:1 +0xc6
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p.(*scorer).SnapshotHook.func1()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/peer_scorer.go:80 +0x4b7
[service_op-node]                  [I] github.com/libp2p/go-libp2p-pubsub.(*peerScore).inspectScoresExtended.func1()
[service_op-node]                  [I] /go/pkg/mod/github.com/libp2p/go-libp2p-pubsub@v0.9.3/score.go:499 +0x47
[service_op-node]                  [I] 
[service_op-node]                  [I] Previous read at 0x00c000049e40 by goroutine 187:
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*scoreRecord).LastUpdated()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/scorebook.go:34 +0x44
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*recordsBook[...]).hasExpired()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/records_book.go:296 +0x55
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*recordsBook[...]).getRecord()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/records_book.go:123 +0x124
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*scoreBook).GetPeerScores()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/scorebook.go:90 +0x9a
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*scoreBook).GetPeerScore()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/store/scorebook.go:110 +0x72
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/store.(*extendedStore).GetPeerScore()
[service_op-node]                  [I] <autogenerated>:1 +0x6f
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p.(*NodeP2P).GetPeerScore()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/node.go:238 +0x71
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/monitor.(*PeerMonitor).checkNextPeer()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/monitor/peer_monitor.go:90 +0x2c5
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/monitor.(*PeerMonitor).checkNextPeer-fm()
[service_op-node]                  [I] <autogenerated>:1 +0x39
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/monitor.(*PeerMonitor).background()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/monitor/peer_monitor.go:136 +0x251
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/monitor.(*PeerMonitor).Start.func1()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/monitor/peer_monitor.go:60 +0x47
[service_op-node]                  [I] 
[service_op-node]                  [I] Goroutine 518 (running) created at:
[service_op-node]                  [I] github.com/libp2p/go-libp2p-pubsub.(*peerScore).inspectScoresExtended()
[service_op-node]                  [I] /go/pkg/mod/github.com/libp2p/go-libp2p-pubsub@v0.9.3/score.go:499 +0x564
[service_op-node]                  [I] github.com/libp2p/go-libp2p-pubsub.(*peerScore).inspectScores()
[service_op-node]                  [I] /go/pkg/mod/github.com/libp2p/go-libp2p-pubsub@v0.9.3/score.go:453 +0x6a
[service_op-node]                  [I] github.com/libp2p/go-libp2p-pubsub.(*peerScore).background()
[service_op-node]                  [I] /go/pkg/mod/github.com/libp2p/go-libp2p-pubsub@v0.9.3/score.go:439 +0x51a
[service_op-node]                  [I] github.com/libp2p/go-libp2p-pubsub.(*peerScore).Start.func1()
[service_op-node]                  [I] /go/pkg/mod/github.com/libp2p/go-libp2p-pubsub@v0.9.3/score.go:251 +0x58
[service_op-node]                  [I] 
[service_op-node]                  [I] Goroutine 187 (running) created at:
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p/monitor.(*PeerMonitor).Start()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/monitor/peer_monitor.go:60 +0x156
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p.(*NodeP2P).init()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/node.go:173 +0x1b84
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/p2p.NewNodeP2P()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/p2p/node.go:61 +0x1d4
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/node.(*OpNode).initP2P()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/node/node.go:363 +0x22b
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/node.(*OpNode).init()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/node/node.go:108 +0x1bb
[service_op-node]                  [I] github.com/ethereum-optimism/optimism/op-node/node.New()
[service_op-node]                  [I] /opt/optimism_instrumented/customer/op-node/node/node.go:69 +0x2ed
[service_op-node]                  [I] ==================

System Specs:

  • OS: x86 bedrock docker containerized system
  • Package Version (or commit hash): release v1.1.1

Additional context
the running setup:
ran a local devnet:
running the nodes with custom code coverage instrumentation and race detection
l1, l2, op-node, op-batcher, op-proposer,
with some network disruptions and service disruptions
network disruptions examples: latency, packet dropping, partitions
service disruptions examples: pausing, restarting replicated services

@qu0b qu0b requested a review from a team as a code owner October 25, 2023 16:39
@qu0b qu0b requested a review from tynes October 25, 2023 16:39
Copy link
Contributor

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, one small bug in the fix though, and I'd like to use the sync/atomic type to avoid any future issues.

op-node/p2p/store/scorebook.go Outdated Show resolved Hide resolved
@qu0b qu0b force-pushed the fix/race branch 2 times, most recently from c9ad0e7 to 5fd1e8f Compare October 25, 2023 18:49
@tynes
Copy link
Contributor

tynes commented Oct 25, 2023

Looks like tests need updating as well

golangci-lint run -E goimports,sqlclosecheck,bodyclose,asciicheck,misspell,errorlint --timeout 5m -e "errors.As" -e "errors.Is" ./...
op-node/p2p/store/extended.go:1: : # github.com/ethereum-optimism/optimism/op-node/p2p/store [github.com/ethereum-optimism/optimism/op-node/p2p/store.test]
op-node/p2p/store/serialize_test.go:14:15: cannot use 1923841 (untyped int constant) as "sync/atomic".Int64 value in struct literal
op-node/p2p/store/serialize_test.go:50:17: cannot use 1923841 (untyped int constant) as "sync/atomic".Int64 value in struct literal
op-node/p2p/store/serialize_test.go:74:17: cannot use 1923841 (untyped int constant) as "sync/atomic".Int64 value in struct literal (typecheck)
package store
make: *** [Makefile:13: lint-go] Error 1

Exited with code exit status 2

@qu0b
Copy link
Author

qu0b commented Oct 25, 2023

It seems that serializing atomic values is not recommended. Should we leave the struct containing a atomic value type or should we add a wrapper to the scoreRecord struct and update the tests accordingly?

@qu0b qu0b changed the title Race condition updating last updated scorebook timestamp fix: race condition updating last updated scorebook timestamp Oct 25, 2023
@protolambda
Copy link
Contributor

@qu0b ah interesting, I forgot about the json usage there, then we need to use a regular int64. Let's put the field as first field in the struct, so it's nicely aligned for atomic access, and add a comment about requiring atomic updates.

fix: import formatting; LastUpdate as atomic.Int64

fix ci/cd
@protolambda protolambda added this pull request to the merge queue Oct 25, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 25, 2023
@tynes tynes added this pull request to the merge queue Oct 25, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 25, 2023
@ajsutton ajsutton added this pull request to the merge queue Oct 25, 2023
@ajsutton
Copy link
Contributor

Flaky test is already on my list to dig into for today. Re-adding to merge queue.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 26, 2023
@ajsutton ajsutton added this pull request to the merge queue Oct 26, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 26, 2023
@tynes tynes merged commit fd19665 into ethereum-optimism:develop Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants