Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Deadlock detected with cs_vSend cs_vRecvMsg #71

Closed
dimxy opened this issue Sep 8, 2021 · 7 comments
Closed

Potential Deadlock detected with cs_vSend cs_vRecvMsg #71

dimxy opened this issue Sep 8, 2021 · 7 comments

Comments

@dimxy
Copy link
Owner

dimxy commented Sep 8, 2021

Received an abortion of komodod with DEBUG_LOCKORDER on, with a print on console:

POTENTIAL DEADLOCK DETECTED
Previous lock order was:
 (1) pnode->cs_vSend  net.cpp:1629 (TRY)
 cs_main  main.cpp:8649 (TRY)
 cs_wallet  wallet/wallet.cpp:3222
 (2) cs_mapRelay  net.cpp:1924
Current lock order is:
 pnode->cs_vRecvMsg  net.cpp:1610 (TRY)
 cs_main  main.cpp:7355
 (2) cs_mapRelay  main.cpp:7449
 (1) cs_vSend  net.cpp:2218
Assertion failed: (onlyMaybeDeadlock), function potential_deadlock_detected, file sync.cpp, line 127.

Env:
two node chain with tokel default params (adaptivepow etc), started as tokeld -ac_name=TKL02, only one node was mining.
the error occurred on the non mining node.

@jmjatlanta
Copy link

In which branch did this take place?

@dimxy
Copy link
Owner Author

dimxy commented Sep 9, 2021

In which branch did this take place?

it was found originally in this branch:

bool pushed = false;

(this branch is for tokel project, generates tokeld/tokel-cli executables with the tokel burned params as defaults, instead of komodod/komodo-cli)

@jmjatlanta
Copy link

jmjatlanta commented Sep 9, 2021

Yes, this certainly has the potential for a deadlock. IMO the easy fix is to gather the data from mapRelay, unlock, then call pfrom->PushMessage at https://github.com/dimxy/komodo/blob/746c982bade26193c21e25ec4384975699bf831d/src/main.cpp#L7449:L7454

A possible strategy to avoid this in the future (architecture change): When it comes to sending things over the wire, figure out a way to unlock almost everything and then send. What I've seen elsewhere is all network messages derive from the same class, and anything to be sent is placed in a queue. Once processing is complete, and methods "bubble back up the stack" (and thereby release their mutexes), the near last step is to lock that queue and send everything in it.

I haven't examined the comms layer to see if that is a viable strategy. I'm just throwing it out there in case it gets your creative juices flowing.

@jmjatlanta
Copy link

jmjatlanta commented Sep 9, 2021

In taking a second look, my suggested fix may not work if the CDataStream is expected to be protected by cs_mapRelay.

If cs_mapRelay only protects the collection, and copies of CDataStream are expected to handle errors, we're good. Otherwise, releasing the lock could allow CDataStream to become invalid before the PushMessage is complete.

Note that CDataStream is a copy, so we don't have to worry about lifetime issues. We only need to worry if having cs_mapRelay locked means something to CDataStream.

@dimxy
Copy link
Owner Author

dimxy commented Sep 11, 2021

As looked into calls stack this occurs when:
On receiving a "getdata" message for a tx in memory, relaying of this tx is triggered (so cs_mapRelay is locked and cs_vSend is locked in PushMessage)
another thread, SendMessages (which already has cs_vSend locked) calls Broadcast() and it calls RelayWalletTransactions() and eventually RelayTransaction() which by the way tries to remove expired txns from mapRelay (locking cs_mapRelay) ,
So we could have a deadlock here.

Seems only copying CDataStream under cs_mapRelay lock and doing PushMessage after the lock released, not locking cs_mayRelay and cs_vSend at the same time, is the correct solution.
Is it something like that: c6e017e ?
And I dont think we need to protect CDataStream, it looks it is just a serialised tx in a dedicated variable

@jmjatlanta
Copy link

Yes, your solution gives the desired results. The call to PushMessage will lock cs_vSend, but mapRelay has already been read and the cs_mapRelay released. Deadlock averted.

@dimxy
Copy link
Owner Author

dimxy commented Dec 5, 2021

91dfbd6 fixed in the tokel branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants