Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNAPSHOT: Improve Resilience SnapshotShardService #36113

Conversation

original-brownbear
Copy link
Member

* Resolve the index in the snapshotting thread
* Added test for routing table - snapshot state mismatch
@original-brownbear original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 v6.6.0 labels Nov 30, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member Author

@ywelsch I realized this fixes the issue in #32265 without really changing behaviour in other situations.
What do you think about using this as a short-cut to fixing the issue while we figure out the best way to test this more granularly on the cluster-state level?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's do this (and also backport to 6.5).

executor.execute(new AbstractRunnable() {

final SetOnce<Exception> failure = new SetOnce<>();

@Override
public void doRun() {
final IndexShard indexShard = indicesService.indexServiceSafe(shardId.getIndex()).getShardOrNull(shardId.id());
final IndexId indexId = indicesMap.get(shardId.getIndexName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can and should still be resolved outside the executor.

@original-brownbear original-brownbear merged commit 433a506 into elastic:master Dec 3, 2018
@original-brownbear original-brownbear deleted the minimal-fix-snapshot-stability branch December 3, 2018 15:39
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 3, 2018
* Resolve the index in the snapshotting thread
* Added test for routing table - snapshot state mismatch
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 3, 2018
* Resolve the index in the snapshotting thread
* Added test for routing table - snapshot state mismatch
original-brownbear added a commit that referenced this pull request Dec 3, 2018
* Resolve the index in the snapshotting thread
* Added test for routing table - snapshot state mismatch
original-brownbear added a commit that referenced this pull request Dec 3, 2018
* Resolve the index in the snapshotting thread
* Added test for routing table - snapshot state mismatch
@original-brownbear
Copy link
Member Author

Backported in #36166 and #36164 :)

kovrus added a commit to crate/crate that referenced this pull request Apr 24, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization
    (elastic/elasticsearch#38368)
kovrus added a commit to crate/crate that referenced this pull request Apr 25, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization
    (elastic/elasticsearch#38368)
kovrus added a commit to crate/crate that referenced this pull request Apr 25, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization (elastic/elasticsearch#38368)
kovrus added a commit to crate/crate that referenced this pull request Apr 25, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization (elastic/elasticsearch#38368)
kovrus added a commit to crate/crate that referenced this pull request Apr 25, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization (elastic/elasticsearch#38368)
kovrus added a commit to crate/crate that referenced this pull request Apr 26, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization (elastic/elasticsearch#38368)
mergify bot pushed a commit to crate/crate that referenced this pull request Apr 26, 2019
- Fix two races condition that lead to stuck snapshots (elastic/elasticsearch#37686)
- Improve resilience SnapshotShardService (elastic/elasticsearch#36113)
- Fix concurrent snapshot ending and stabilize snapshot finalization (elastic/elasticsearch#38368)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants