-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CCR] Add more unit tests for shard follow task #32121
Conversation
The added tests are based on specific scenarios as described in the test plan. Before this change the ShardFollowNodeTaskTests contained more random like tests, but these have been removed and in a followup pr better random tests will be added in a new test class as is described in the test plan.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg for adding many great tests. I left some comments.
task.coordinateReads(); | ||
assertThat(shardChangesRequests.size(), equalTo(8)); | ||
assertThat(shardChangesRequests.get(0)[0], equalTo(0L)); | ||
assertThat(shardChangesRequests.get(0)[1], equalTo(8L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fold all these assertions into a single one? I think this should cover enough.
assertThat(shardChangesRequests, contains(new long[][]{
{0L, 8L}, {9L, 8L}, {18L, 8L}, {27L, 8L}, {36L, 8L}, {45L, 8L}, {54L, 8L}, {63L, 8L}
}));
Moreover, the leader should not return more than the requesting batch size. Here, we request 8 operations, but it returns 9 operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not the end offset but the maxBatchOperationCount
: https://github.com/elastic/elasticsearch/pull/32121/files#diff-e143edbe5e6171625fe27ec29039aabdR622
Since that is always the same perhaps it not worth asserting it. Should we just assert the from offset?
assertThat(status.getLeaderGlobalCheckpoint(), equalTo(128L)); | ||
} | ||
|
||
public void testCoordinateReads_maxConcurrentReads() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using camel_case here? I think testMaxConcurrentReads
should be good.
@@ -61,13 +61,14 @@ | |||
private final BiConsumer<TimeValue, Runnable> scheduler; | |||
|
|||
private volatile long lastRequestedSeqno; | |||
private volatile long leaderGlobalCheckpoint; | |||
// package-protected visibility for testing only: | |||
volatile long leaderGlobalCheckpoint; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should increase the visibility to allow changing this volatile variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not happy with it too. But otherwise testing the isolated steps is much trickier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this exposed in the status object? can't we use that maybe? just an idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will make testing easier with out changing leaderGlobalCheckpoint field's visibility
based on a from value + maxOperationCount, - 1 need be applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some points to discuss.
void startTask(ShardFollowNodeTask task, long leaderGlobalCheckpoint, long followerGlobalCheckpoint) { | ||
task.start(followerGlobalCheckpoint); | ||
// Shortcut to just set leaderGlobalCheckpoint, calling for example handleReadResponse() has side effects that | ||
// complicates testing in isolation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a ShardFollowNodeTask#start method that accepts two params: leaderGCP and followerGCP. Then we can remove updateLeaderGlobalCheckpoint and this helper method startTask
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think that in the case of this test that this would be better. The only downside is that in production code we will always set the leaderGCP and followerGCP to the same value. ( The ShardFollowTasksExecutor
only fetches the followerGCP). But I think this better then separating out updating the leaderGCP in a separate method.
|
||
shardChangesRequests.clear(); | ||
synchronized (task) { | ||
task.updateLeaderGlobalCheckpoint(128L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can setup other scenarios to test the cancellation if we remove updateLeaderGlobalCheckpoint
. For example, make the read limits reached, then cancel, then verify that we won't issue any read request.
shardChangesRequests.clear(); | ||
// Also invokes the coordinatesReads() method: | ||
task.innerHandleReadResponse(0L, 63L, generateShardChangesResponse(0, 63, 0L, 128L)); | ||
assertThat(shardChangesRequests.size(), equalTo(0)); // no more reads, because write buffer is full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should add buffer
(size or operations) to the Status object? We can do it in a follow up if you are okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this too, but in a follow up. I think when Jason is going to work on the ccr stats then this is a stat that is very likely to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed.
…per method," This reverts commit 47bc37c.
…od for testing purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg. LGTM
The added tests are based on specific scenarios as described in the test plan. Before this change the ShardFollowNodeTaskTests contained more random like tests, but these have been removed and in a followup pr better random tests will be added in a new test class as is described in the test plan.
The added tests are based on specific scenarios as described in the test plan.
Before this change the ShardFollowNodeTaskTests contained more random like tests,
but these have been removed and in a followup pr better random tests will
be added in a new test class as is described in the test plan.