Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A deadlock was found during a PRS. The root cause was a fix where we changed the replmanager to take the action lock. Otherwise, it would potentially race and conflict with other actions. But this led to a deadlock because
PromoteReplica
also waits for the replmanager to finish its fix.We could have spot-fixed this for the specific use case. But in the interest of preventing other corner cases, the better fix was to change replmanager to not wait if it couldn't obtain a lock.
However, the implementation of
lock
with context timeout was flawed, because it wouldn't really timeout if the context expired. So, I implemented a new AcquireContext in sync2.Semaphore to, which encouraged to fix the flaky tests there.Using the semaphore allowed me to implement a real
tryLock
, and replManager could use it.Since this was a race condition, I tested it manually. The test that failed previously now passes.