-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix deadlock in network-devp2p
#11013
Fix deadlock in network-devp2p
#11013
Conversation
It looks like @AtkinsChang signed our Contributor License Agreement. 👍 Many thanks, Parity Technologies CLA Bot |
Deadlocks for me in less than a minute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing catches, thanks a lot!
@@ -722,12 +722,13 @@ impl Host { | |||
let session_result = session.lock().readable(io, &self.info.read()); | |||
match session_result { | |||
Err(e) => { | |||
let reserved_nodes = self.reserved_nodes.read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other occurences where we need reserved_nodes
and session.lock()
?
As a rule of thumb, we try to keep the lock order in the same order as fields declaration. Here we lock something that is not part of the original struct, but since it's retrieve from self.sessions
I'd say that we should keep the lock ordering as well.
Otherwise let's put a comment on top of sessions:
and note what order the session.lock()
should be acquired in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sessions
:
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L383-L386
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L845-L849
From NetworkContext::new
:
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L856-L858
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L957-L959
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great stuff. I just tested the branch against kovan and cannot make it deadlock. Let's wait for @ngotchac's review and then merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch !
* master: EIP 1108: Reduce alt_bn128 precompile gas costs (openethereum#11008) Fix deadlock in `network-devp2p` (openethereum#11013)
…he-right-place * master: Extract snapshot to own crate (#11010) Edit publish-onchain.sh to use https (#11016) EIP 1108: Reduce alt_bn128 precompile gas costs (#11008) Fix deadlock in `network-devp2p` (#11013) Implement EIP-1283 reenable transition, EIP-1706 and EIP-2200 (#10191) EIP 1884 Re-pricing of trie-size dependent operations (#10992)
* Fix deadlock in `network-devp2p` * Add note for locking order in `network-devp2p`
* Fix deadlock in `network-devp2p` * Add note for locking order in `network-devp2p`
* add more tx tests (#11038) * Fix parallel transactions race-condition (#10995) * Add blake2_f precompile (#11017) * [trace] introduce trace failed to Ext (#11019) * Edit publish-onchain.sh to use https (#11016) * Fix deadlock in network-devp2p (#11013) * EIP 1108: Reduce alt_bn128 precompile gas costs (#11008) * xDai chain support and nodes list update (#10989) * EIP 2028: transaction gas lowered from 68 to 16 (#10987) * EIP-1344 Add CHAINID op-code (#10983) * manual publish jobs for releases, no changes for nightlies (#10977) * [blooms-db] Fix benchmarks (#10974) * Verify transaction against its block during import (#10954) * Better error message for rpc gas price errors (#10931) * Fix fork choice (#10837) * Fix compilation on recent nightlies (#10991)
* add more tx tests (#11038) * Fix parallel transactions race-condition (#10995) * Add blake2_f precompile (#11017) * [trace] introduce trace failed to Ext (#11019) * Edit publish-onchain.sh to use https (#11016) * Fix deadlock in network-devp2p (#11013) * EIP 1108: Reduce alt_bn128 precompile gas costs (#11008) * xDai chain support and nodes list update (#10989) * EIP 2028: transaction gas lowered from 68 to 16 (#10987) * EIP-1344 Add CHAINID op-code (#10983) * manual publish jobs for releases, no changes for nightlies (#10977) * [blooms-db] Fix benchmarks (#10974) * Verify transaction against its block during import (#10954) * Better error message for rpc gas price errors (#10931) * tx-pool: accept local tx with higher gas price when pool full (#10901) * Fix fork choice (#10837) * Cleanup unused vm dependencies (#10787) * Fix compilation on recent nightlies (#10991)
Fix two deadlock in devp2p module.
Reproduce
deadlock will occur in 5 minutes with this docker-compose.yaml.
Explanation
In
fn connect_peers(&self, io: &IoContext<NetworkIoMessage>)
, try to acquire mutex forSession
while holding r-lock forreserved_peers
:https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L576
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L603-L604
in
fn session_readable(&self, token: StreamToken, io: &IoContext<NetworkIoMessage>)
, there are two code path trying to acquire r-lock forreserved_peers
while holding mutex forSession
:https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L768
In the situation that:
connect_peers
holding r-lock forreserved_peers
and waiting thesession
mutexsession_readable
holding mutex forsession
and trying to acquire r-lock forreserved_peers
if there is another thread C acquiring w-lock for
reserved_peers
, thread B will be blocked until there are no more writers which hold / waiting for the lock cause the dead lock among three threads.Another deadlock is cause by
reserved_nodes
andnode
infn update_nodes(&self, _io: &IoContext<NetworkIoMessage>, node_changes: TableUpdates)
andfn connect_peers(&self, io: &IoContext<NetworkIoMessage>)
.