Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance SnapshotResiliencyTests (#49514) #49541

Merged
merged 1 commit into from
Nov 25, 2019

Conversation

original-brownbear
Copy link
Member

A few enhancements to SnapshotResiliencyTests:

  1. Test running requests from random nodes in more spots to enhance coverage (this is particularly motivated by Use ClusterState as Consistency Source for Snapshot Repositories #49060 where the additional number of cluster state updates makes it more interesting to fully cover all kinds of network failures)
  2. Fix issue with restarting only master node in one test (doing so breaks the test at an incredibly low frequency, that becomes not so low in Use ClusterState as Consistency Source for Snapshot Repositories #49060 with the additional cluster state updates between request and response)
  3. Improved cluster formation checks (now properly checks the term as well when forming cluster) + makes sure all nodes are connected to all other nodes (previously the data nodes would at times not be connected to other data nodes, which was shaken out now by adding the client() method
  4. Make sure the cluster left behind by the test makes sense by running the repo cleanup action on it (this also increases coverage of the repository cleanup action obviously and adds the basis of making it part of more resiliency tests)

backport of #49514

A few enhancements to `SnapshotResiliencyTests`:
1. Test running requests from random nodes in more spots to enhance coverage (this is particularly motivated by elastic#49060 where the additional number of cluster state updates makes it more interesting to fully cover all kinds of network failures)
2. Fix issue with restarting only master node in one test (doing so breaks the test at an incredibly low frequency, that becomes not so low in elastic#49060 with the additional cluster state updates between request and response)
3. Improved cluster formation checks (now properly checks the term as well when forming cluster) + makes sure all nodes are connected to all other nodes (previously the data nodes would at times not be connected to other data nodes, which was shaken out now by adding the `client()` method
4. Make sure the cluster left behind by the test makes sense by running the repo cleanup action on it (this also increases coverage of the repository cleanup action obviously and adds the basis of making it part of more resiliency tests)
@original-brownbear original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Nov 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear original-brownbear merged commit 2502ff3 into elastic:7.x Nov 25, 2019
@original-brownbear original-brownbear deleted the 49514-7.x branch November 25, 2019 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants