You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current codebase makes heavy use of mutexes. Most mutexes are recursive. As this comment in the codebase hints at, and many online discussions talk about, recursive mutexes should be avoided.
Switching from a recursive mutex in some parts of the codebase will be relatively straightforward and easy. Other areas will require some thought and changes to method signatures.
TL;DR> With a bit of forethought and a good dose of effort, we can modify the existing codebase to make locking safer and performant.
I have included here an example of a typical (not easy, not hard) use case. As is obvious, each case will be different.
The method CheckProofOfWork is passed the CBlockHeader to check, a pubkey33, the height, and the consensus params. From there, it grabs some notary information and the current tip time. To grab that information requires a (read) lock on cs_main. Sometimes, the caller already has the lock, and other times it does not. Using a recursive lock, we just lock it, and get the information we need, making sure to "unlock" it on the way out.
But as we read in the links above, recursive locks should be avoided. And they can be avoided in this case fairly easily. CheckProofOfWork will not be harmed if "stale" data is used. Even a chain reorg in another process shouldn't cause a problem within this method. So the basic responsibility of CheckProofOfWork should not require that cs_main be locked.
However, the current implementation of this method requires that lock to safely get information from chainActive. So we have a few choices here.
Use recursive locking as we do now.
Require that the caller always has the lock held before calling this method. We will get the information from chainActive without locking it.
Require that the caller never has the lock held before calling this method. We will lock chainActive when we need to.
Require that the caller pass us a boolean as to whether cs_main is locked or not, or have two methods that do the same thing.
Require that the caller gather the information from chainActive that we need and pass it to the method, so CheckProofOfWork is lock-free.
These are my opinions, numbered to match the options above:
Recursive locking should be avoided.
Having the caller lock it means some processing will be done with the lock needlessly held, slowing down other threads.
This is a viable option IF we can guarantee that this method will not be part of a process that IS sensitive to stale data.
Now we are doing conditional locking. In my mind, this is more prone to errors than option 1.
Although CheckProofOfWork does not seem to be affected by stale data, it could be involved in a process that is sensitive. Having the caller gather the information using proper locking, and then passing that info to our method means stale-sensitive processes can hold the lock, and performance-sensitive processes don't need to.
We have weapons that will help us in our fight.
There are existing runtime checks (incompletely implemented). Running the code in a test environment with the right compile flags will help diagnose problems.
There are existing static checks (incompletely implemented). Warnings at compile time will help prevent problems.
To move forward, I believe we need to agree that such modifications are worth the effort. Then I believe we should assign a priority to it. Migrating away from recursive locks can be done in steps, and IMO is a safer way to handle the task.
The text was updated successfully, but these errors were encountered:
who-biz
pushed a commit
to who-biz/komodo
that referenced
this issue
Mar 24, 2023
The current codebase makes heavy use of mutexes. Most mutexes are recursive. As this comment in the codebase hints at, and many online discussions talk about, recursive mutexes should be avoided.
Switching from a recursive mutex in some parts of the codebase will be relatively straightforward and easy. Other areas will require some thought and changes to method signatures.
TL;DR> With a bit of forethought and a good dose of effort, we can modify the existing codebase to make locking safer and performant.
I have included here an example of a typical (not easy, not hard) use case. As is obvious, each case will be different.
The method
CheckProofOfWork
is passed theCBlockHeader
to check, apubkey33
, the height, and the consensus params. From there, it grabs some notary information and the current tip time. To grab that information requires a (read) lock on cs_main. Sometimes, the caller already has the lock, and other times it does not. Using a recursive lock, we just lock it, and get the information we need, making sure to "unlock" it on the way out.But as we read in the links above, recursive locks should be avoided. And they can be avoided in this case fairly easily.
CheckProofOfWork
will not be harmed if "stale" data is used. Even a chain reorg in another process shouldn't cause a problem within this method. So the basic responsibility ofCheckProofOfWork
should not require that cs_main be locked.However, the current implementation of this method requires that lock to safely get information from
chainActive
. So we have a few choices here.chainActive
that we need and pass it to the method, soCheckProofOfWork
is lock-free.These are my opinions, numbered to match the options above:
CheckProofOfWork
does not seem to be affected by stale data, it could be involved in a process that is sensitive. Having the caller gather the information using proper locking, and then passing that info to our method means stale-sensitive processes can hold the lock, and performance-sensitive processes don't need to.We have weapons that will help us in our fight.
To move forward, I believe we need to agree that such modifications are worth the effort. Then I believe we should assign a priority to it. Migrating away from recursive locks can be done in steps, and IMO is a safer way to handle the task.
The text was updated successfully, but these errors were encountered: