-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: race condition updating last updated scorebook timestamp #7838
fix: race condition updating last updated scorebook timestamp #7838
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, one small bug in the fix though, and I'd like to use the sync/atomic
type to avoid any future issues.
c9ad0e7
to
5fd1e8f
Compare
Looks like tests need updating as well
|
It seems that serializing atomic values is not recommended. Should we leave the struct containing a atomic value type or should we add a wrapper to the scoreRecord struct and update the tests accordingly? |
@qu0b ah interesting, I forgot about the |
fix: import formatting; LastUpdate as atomic.Int64 fix ci/cd
Flaky test is already on my list to dig into for today. Re-adding to merge queue. |
In collaboration with antithesis and quantstamp we have setup the bedrock testnet v1.1.1 for testing and found this bug.
Describe the bug
The data race happens in the p2p code, which is generally responsible for distributing unsafe blocks to replicas. There is also a component that scores and bans peers. The record bookkeeping for p2p stats is updated by the p2p library and the op-node seems to only read these records to determine whether peers should be banned. The race specifically happens due to reading the
LastUpdate
timestamp while it is being updated at the same time.Expected behavior
No data race
System Specs:
Additional context
the running setup:
ran a local devnet:
running the nodes with custom code coverage instrumentation and race detection
l1, l2, op-node, op-batcher, op-proposer,
with some network disruptions and service disruptions
network disruptions examples: latency, packet dropping, partitions
service disruptions examples: pausing, restarting replicated services