-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Refactor SnapshotResiliencyTests #140152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor SnapshotResiliencyTests #140152
Conversation
Also expand DeterministicTaskQueue
…iency-tests-to-stateless
…iency-tests-to-stateless
Also support waiting for a condition before starting the restart node.
…iency-tests-to-stateless
…iency-tests-to-stateless
…-resiliency-tests-to-stateless
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks broadly good to me, just a few questions/nits.
| } | ||
| return true; | ||
| }).sorted(Comparator.comparing(n -> n.node.getName())).toList(); | ||
| return dataNodes.isEmpty() ? Optional.empty() : Optional.ofNullable(ESTestCase.randomFrom(dataNodes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: slight preference for qualified imports rather than qualifying the method calls
| return dataNodes.isEmpty() ? Optional.empty() : Optional.ofNullable(ESTestCase.randomFrom(dataNodes)); | |
| return dataNodes.isEmpty() ? Optional.empty() : Optional.ofNullable(randomFrom(dataNodes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure pushed 94a9744
| } | ||
| } | ||
|
|
||
| public static class TestClusterNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why raise this to a first-level inner static class rather than remaining an inner class of TestClusterNodes? I'm guessing it's so that we can subclass it later? But if you're in a subclass of TestClusterNodes can you not subclass TestClusterNode? Haven't tried it, just curious really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sorta generally prefer top level classes. But it is also ok to keep it as an inner class of TestClusterNodes. We can still subclass it as you suggested which is what happened for DeterministicQueue#DeterministicThreadPool. So I changed it accordingly in 9d6806d
Since it touches the constructor, the change also moved both nodeSettings() and createPluginsService() methods to TestClusterNodes so that we can remove the this-escape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you want TestClusterNodes to be put back into SnapshotResiliencyTests. I'd like to still make it a static class in that case. That said, I find it overall more manageable with two separate files since both of them have quite large sizes and we could potentially use the test cluster for other things in future.
| actions.put( | ||
| TransportClusterHealthAction.TYPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to register this action now? Seems like a little more than a pure refactor if so. If not, can we delay it until it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this action registration along with the doInit method in 1dbc330
| ) | ||
| ); | ||
|
|
||
| doInit(actions, actionFilters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, this hook doesn't do anything yet and adding it will be relatively noise-free later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
|
|
||
| private Coordinator coordinator; | ||
|
|
||
| @SuppressWarnings("this-escape") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really allow a this to escape, given that we now have a separate init() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I see now, I thought it picked this up earlier in the precommit run than it did. The issue is the calls to the instance methods nodeSettings() and createPluginsService(), which do not leak this here but might do so in a subclass I guess. But I wonder, do we need these to be overridable by subclasses or could we pass them in as contructor parameters instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented above, it is removed in 9d6806d
|
|
||
| runUntil(documentCountVerified::get, TimeUnit.MINUTES.toMillis(5L)); | ||
| assertNotNull(safeResult(createSnapshotResponseListener)); | ||
| assertNotNull(safeResult(restoreSnapshotResponseListener)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm do we need to drop this assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a mistake. Restored in 8970408 Thanks for catching it.
| .filter(ShardRouting::primary) | ||
| .findFirst() | ||
| .orElseThrow(() -> new AssertionError("no primary shard found")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight preference for deferring this change in behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, restored in 266bd8b
| null, | ||
| emptySet() | ||
| ); | ||
| // TODO: The indexNameExpressionResolver does not use the same threadContext and projectResolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this important? It sounds important...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically it is not important since the tests do not rely on either multi-project or system indices. I didn't touch it because there are other inconsistencies with projectResolver and I was not sure whether it was the right PR to fix them. Since we now have these changes separately. I pushed 4b5d66a to fix them.
Also removed maybeExecuteTransportRunnable
Removed this-escape and some adjustments for constructor
|
@DaveCTurner This PR is ready for another look when you have time. Thanks! |
Refactor for
SnapshotResiliencyTeststo make it easier to extend. The main changes are (1) extract TestCluster related code into its own file and (2) add a separate init phase to the test node so that it can be customized by future subclasses. It also extracts serveral protected methods from existing code so they can be customized in future as well.Relates: #138333