-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-20996 Use LWTs for all auto-repair history mutations #4456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
@jaydeepkumar1984 can I have a review on this please? |
test/unit/org/apache/cassandra/repair/autorepair/AutoRepairUtilsTest.java
Show resolved
Hide resolved
| @@ -538,4 +538,42 @@ public void testSkipSystemTraces() | |||
| { | |||
| assertFalse(AutoRepairUtils.shouldConsiderKeyspace(Keyspace.open(SchemaConstants.TRACE_KEYSPACE_NAME))); | |||
| } | |||
|
|
|||
| @Test | |||
| public void testAutoRepairHistoryOutOfOrderDeleteRaceCondition() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test passes even without any of the changes because ADD_HOST_ID_TO_DELETE_HOSTS already had IF EXISTS, however, the changes in this PR are necessary.
Please clarify in the description that the PR includes the test case and a certain cases we missed earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've extended this test to include cases that make this test fail without the changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, updated the description
For issue https://issues.apache.org/jira/browse/CASSANDRA-20996
The auto-repair history management mechanism does not use LWTs for all of its history table queries. As a result, it is possible to run into a few edge cases where a deleted node's repair history gets resurrected. For example:
This can lead to the following race condition:
This PR introduces a unit test to simulate these out-of-order deletions/upserts and updates the auto-repair history mutations to use LWTs in order to prevent the race conditions from happening.