Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock elimination for GetEstimatedTimetoStake() #1084

Merged
merged 1 commit into from
May 8, 2018

Conversation

jamescowens
Copy link
Member

The new GetEstimatedTimetoStake() function has a rare
deadlock situation between the cs_main and cs_wallet locks taken for
a very short time in it and the MinerStatus locked in GetGlobalStatus()
and CreateCoinStake(). ifoggz suggested called GetEstimatedTimetoStake() and
assigning to a local variable right before the critical section in
GetGlobalStatus() to correct the lock order. This appears to work very
well. Wallet against testnet with the change has been running >24 hours without
deadlock.

The new GetEstimatedTimetoStake() function has a rare
deadlock situation between the cs_main and cs_wallet locks taken for
a very short time in it and the `MinerStatus` locked in `GetGlobalStatus()`
and CreateCoinStake(). ifoggz suggested called GetEstimatedTimetoStake() and
assigning to a local variable right before the critical section in
GetGlobalStatus() to correct the lock order. This appears to work very
well. Wallet against testnet with the change has been running >24 hours without
deadlock.
@iFoggz
Copy link
Member

iFoggz commented May 7, 2018

MinerStatus gets locked without any previous locks and/or after cs_main and cs_wallet. due to the need for the cs_main and cs_wallet locks in GetEstimiatedTimetoStake() we essentially reversed the order before. This prevents that this fix will make sure that the data is received first before locking the MinerStatus

@denravonska
Copy link
Member

I think we have an overall problem when you have to know the lock strategy for the entire call chain. This makes these kind of problems unavoidable and hard to debug. We're just lucky you caught this one.

Perhaps instead of manipulating the status object directly we do it on a temporary and assign at the end. That way we can keep the locks to a minimum.

globalStatusType status;
status.blocks = ToString(nBestHeight);
status.ETTS = RoundToString(GetEstimatedTimetoStake()/86400.0,3);
...
..

LOCK(GlobalStatusStruct.lock);
GlobalStatusStruct = status;

We can do it outside of this PR though.

@jamescowens
Copy link
Member Author

@denravonska I really like that. Can you merge this one into dev/staging. I will begin working on another one to implement this approach...

@denravonska denravonska merged commit c66fb84 into gridcoin-community:staging May 8, 2018
@tomasbrod
Copy link
Member

  • Just do not put the lock into the variable you are overwriting.
  • consider using atomic pointer

denravonska added a commit that referenced this pull request May 25, 2018
Fixed
 - Fixes for displaying on high DPI displays, #517 (@skcin).
 - Re-enable unit tests, add unit test to Travis, #769, #808 (@TheCharlatan).
 - Fix empty string in sendalert2 (@tomasbrod).

Added
 - Neural Report RPC command, #1063 (@tomasbrod).
 - GUI wallet redign with new icons and purple native style (@skcin).

Changed
 - Switch to autotools and Depends from Bitcoin, #487 (@TheCharlatan).
 - Clean and update docs for new build system, remove outdated, #828 (@TheCharlatan).
 - Change estimated time to stake calculations to be more accurate, #1084 (@jamescowens).
 - Move logging to tinyformat, #1009 (@TheCharlatan).
 - Improve appcache performance, #734 (@denravonska).
 - Improve block index memory access performance, #679 (@denravonska).
 - NN fixes: clean logging, explain mag single response, move contract to ndata_nresp (@denravonska)
 - Updated translations:
    - Turkish, #771 (@confuest).
    - Chinese, #1012 (@linnaea).
 - RPC refactor: Cleaner locks, better error handling, move execute calls to straght rpc calls, #1024 (@Foggyx420).
 - Change locking primitives from Boost to STL, #1029 (@Foggyx420).

Removed
 - gridcoindiagnostic RPC call (@denravonska).
 - Galaza, #945 (@barton2526).
 - Assertion in SignSignature, #998 (@TheCharlatan).
 - Upgrade menu, #1094 (@jamescowens).
 - Acid test functions, #871 (@tomasbrod).
 - Qt4 support, #801 (@denravonska).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants