-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve database deadlock: #4989
Conversation
The `rotateWithLock` function holds a lock while it calls a callback function that's passed in by the caller. This is a problematic design that needs to be used very carefully. In this case, at least one caller passed in a callback that eventually relocks the mutex on the same thread, causing UB (a deadlock was observed). The caller was from SHAMapStoreImpl, and it called `clearCaches`. This `clearCaches` can potentially call `fetchNodeObject`, which tried to relock the mutex. This patch resolves the issue by changing the mutex type to a `recursive_mutex`. Ideally, the code should be rewritten so it doesn't hold the mutex during the callback and the mutex should be changed back to a regular mutex.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4989 +/- ##
=======================================
Coverage 70.9% 71.0%
=======================================
Files 796 796
Lines 66727 66727
Branches 10981 10973 -8
=======================================
+ Hits 47333 47345 +12
+ Misses 19394 19382 -12
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish that the lock handling did not require a recursive_mutex
. But we are where we are, and this is a pragmatic solution. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a low-risk alternative at this time.
The `rotateWithLock` function holds a lock while it calls a callback function that's passed in by the caller. This is a problematic design that needs to be used very carefully. In this case, at least one caller passed in a callback that eventually relocks the mutex on the same thread, causing UB (a deadlock was observed). The caller was from SHAMapStoreImpl, and it called `clearCaches`. This `clearCaches` can potentially call `fetchNodeObject`, which tried to relock the mutex. This patch resolves the issue by changing the mutex type to a `recursive_mutex`. Ideally, the code should be rewritten so it doesn't hold the mutex during the callback and the mutex should be changed back to a regular mutex. Co-authored-by: Ed Hennis <ed@ripple.com>
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
* Follow-up to XRPLF#4989
The
rotateWithLock
function holds a lock while it calls a callback function that's passed in by the caller. This is a problematic design that needs to be used very carefully. In this case, at least one caller passed in a callback that eventually relocks the mutex on the same thread, causing UB (a deadlock was observed). The caller was from SHAMapStoreImpl, and it calledclearCaches
. ThisclearCaches
can potentially callfetchNodeObject
, which tried to relock the mutex.This patch resolves the issue by changing the mutex type to a
recursive_mutex
. Ideally, the code should be rewritten so it doesn't hold the mutex during the callback and the mutex should be changed back to a regular mutex.Type of Change