Track Repository Gen. in BlobStoreRepository #48944

original-brownbear · 2019-11-11T14:03:53Z

This is intended as a stop-gap solution/improvement to #38941 that
prevents repo modifications without an intermittent master failover
from causing inconsistent (outdated due to inconsistent listing of index-N blobs)
RepositoryData to be written.

Tracking the latest repository generation will move to the cluster state in a
separate pull request. This is intended as a low-risk change to be backported as
far as possible and motived by the recently increased chance of #38941
causing trouble via SLM (see #47520).

Closes #47834
Closes #49048

This is intended as a stop-gap solution/improvement to elastic#38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended to be backported as far as possible and motived by the recently increased chance of elastic#38941 causing trouble via SLM.

elasticmachine · 2019-11-11T14:03:55Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

…olution

original-brownbear · 2019-11-11T15:46:10Z

Jenkins run elasticsearch-ci/packaging-sample-matrix (seems to hang on uploading build result)

…olution

original-brownbear · 2019-11-12T13:45:06Z

I adjusted this PR to gracefully/automatically hande concurrent repository modifications as discussed earlier today. See c540d39 (in particular the revert of test changes I initially added here to make the change work with tests clearing out repos that are now unnecessary)

This also automatically resolves #47834 since gracefully retrying on an external delete of index-N blob is functionally equivalent to concurrent modification issues.

original-brownbear · 2019-11-12T14:29:25Z

As discussed with Yannick on another channel, adding a test for eventual consistent listing here as well. Will re-request reviews once that's in.

…olution

original-brownbear · 2019-11-12T16:41:48Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            // It's always a possibility to not see the latest index-N in the listing here on an eventually consistent blob store, just
+            // debug log it. Any blobs leaked as a result of an inconsistent listing here will be cleaned up in a subsequent cleanup or
+            // snapshot delete run anyway.
+            logger.debug("Determined repository's generation from its contents to [" + generation + "] but " +


This may be a little controversial:

By tracking the latest gen in the field, we can now identify out of sync listings that we would have previous missed and that would just have failed in a subsequent step where the repo gen is compared. WIth this change, if we miss to list the latest index-N, we can still complete a delte or cleanup just fine (assuming the value in latestKnownRepoGen is correct).

I think it's better user experience to not do a perfect cleanup in this edge case but proceed with the delete/cleanup as if nothing happened. On an eventually consistent repo, the fact that we list out the correct index-N does not guarantee that we didn't miss any other root blobs in the listing anyway.
Also, apart from maybe missing some stale blobs, the delete will work out perfectly fine otherwise.

original-brownbear · 2019-11-12T16:47:16Z

.../src/test/java/org/elasticsearch/snapshots/mockstore/MockEventuallyConsistentRepository.java

+
+            // Randomly filter out the latest /index-N blob from a listing to test that tracking of it in latestKnownRepoGen
+            // overrides an inconsistent listing
+            private Map<String, BlobMetaData> maybeMissLatestIndexN(Map<String, BlobMetaData> listing) {


I am aware that this does not fully cover all possible inconsistent listing scenarios, but only the scenario of missing a known (in the latestKnownRepoGen field) index-N, but correctly handling this scenario is the only thing fixed here for now. It's the most likely scenario in practice though in my opinion (inconsistent listing after back-to-back operations without master failover).

Looks sufficient to me

original-brownbear · 2019-11-12T16:57:37Z

This should be good for review now :)

ywelsch

I've left some comments

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

ywelsch · 2019-11-13T08:40:04Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            }
+            final long genToLoad = latestKnownRepoGen.updateAndGet(known -> Math.max(known, generation));
+            if (genToLoad != generation) {
+                logger.warn("Determined repository generation [" + generation


should this be warn level? In safeRepositoryData you've just logged this as debug.

Also, this warning is confusing to a user. Perhaps we could talk about eventually consistent repositories here.

You're right. Let's just make this debug. I wouldn't necessarily start talking about eventual consistency here. It's not the only thing that might lead to this warning, concurrent modifications of the repo will have the same result.

In hindsight, I wonder if we should log this at info level, just so that we get some stats on how often this logic saves the day on Cloud

Right now I'd assume/hope the answer here is "never" :D (with standard snapshotting ... other functionality/manual action/... may trigger this obviously) but yea. Let's do info and verify :)

…olution

original-brownbear

Thanks @ywelsch all addressed I think :)

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

original-brownbear · 2019-11-13T09:06:11Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            }
+            final long genToLoad = latestKnownRepoGen.updateAndGet(known -> Math.max(known, generation));
+            if (genToLoad != generation) {
+                logger.warn("Determined repository generation [" + generation


You're right. Let's just make this debug. I wouldn't necessarily start talking about eventual consistency here. It's not the only thing that might lead to this warning, concurrent modifications of the repo will have the same result.

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

tlrx · 2019-11-13T09:55:27Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -920,6 +963,12 @@ private RepositoryData getRepositoryData(long indexGen) {
                return RepositoryData.snapshotsFromXContent(parser, indexGen);
            }
        } catch (IOException ioe) {
+            // If we fail to load the generation we tracked in latestKnownRepoGen we reset it.


I'm wondering if resetting is the right thing to do here. If the content of the repo has been deleted (or bucket/folder moved, or permissions changed etc) maybe we should keep the last generation seen around, and let the user sort the issue and re-register the repository?

We talked about that yesterday and I figured that we decided not to do that (yet). I'm of the same opinion but it's quite the change in behavior if we want to just do this as a short-term fix.
Maybe we should just move to that kind of stricter approach in 7.x once we start tracking the repo generation in the CS permanently but for now not do any big experiments? :)

Rah, I've already forgot about this discussion, sorry. But I'm good with the plan.

ywelsch

LGTM (left one comment about logging)

…olution

original-brownbear · 2019-11-14T20:48:31Z

Jenkins run elasticsearch-ci/2 (random X-pack failure)

original-brownbear · 2019-11-14T21:29:44Z

Thanks Yannick & Tanguy!

This is intended as a stop-gap solution/improvement to elastic#38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of elastic#38941 causing trouble via SLM (see elastic#47520). Closes elastic#47834 Closes elastic#49048

This is intended as a stop-gap solution/improvement to #38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of #38941 causing trouble via SLM (see #47520). Closes #47834 Closes #49048

original-brownbear added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.5.0 v7.6.0 v7.4.3 labels Nov 11, 2019

original-brownbear added 2 commits November 11, 2019 15:37

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

7cf6118

…olution

fix test

79c9569

original-brownbear requested review from ywelsch, tlrx and andrershov November 11, 2019 16:28

jimczi removed the v7.5.0 label Nov 12, 2019

original-brownbear added 3 commits November 12, 2019 13:02

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

46ade8d

…olution

Gracefully handle concurent repository modifications

c540d39

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

6acb369

…olution

original-brownbear removed request for andrershov, tlrx and ywelsch November 12, 2019 14:28

original-brownbear added 3 commits November 12, 2019 15:30

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

f527f4c

…olution

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

8d8ca3c

…olution

test eventually consistent listings

bf7ba43

original-brownbear commented Nov 12, 2019

View reviewed changes

original-brownbear requested review from tlrx and ywelsch November 12, 2019 16:57

ywelsch suggested changes Nov 13, 2019

View reviewed changes

original-brownbear added 2 commits November 13, 2019 09:58

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

cccaa91

…olution

CR comments

2dbb334

original-brownbear commented Nov 13, 2019

View reviewed changes

original-brownbear requested a review from ywelsch November 13, 2019 09:39

tlrx reviewed Nov 13, 2019

View reviewed changes

original-brownbear mentioned this pull request Nov 13, 2019

[CI] SharedClusterSnapshotRestoreIT#testGetSnapshotsRequest test failure #49048

Closed

ywelsch added v7.5.0 and removed v7.4.3 labels Nov 14, 2019

ywelsch approved these changes Nov 14, 2019

View reviewed changes

original-brownbear added 3 commits November 14, 2019 18:38

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

9ec4f52

…olution

Merge remote-tracking branch 'elastic/master' into stopgap-repo-gen-s…

e23d6a4

…olution

info level

8b31693

original-brownbear merged commit 37c58ca into elastic:master Nov 14, 2019

original-brownbear deleted the stopgap-repo-gen-solution branch November 14, 2019 21:30

original-brownbear added the backport pending label Nov 14, 2019

original-brownbear mentioned this pull request Nov 14, 2019

Track Repository Gen. in BlobStoreRepository (#48944) #49116

Merged

original-brownbear removed the backport pending label Nov 14, 2019

original-brownbear mentioned this pull request Nov 14, 2019

Track Repository Gen. in BlobStoreRepository (#48944) #49119

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track Repository Gen. in BlobStoreRepository #48944

Track Repository Gen. in BlobStoreRepository #48944

original-brownbear commented Nov 11, 2019 •

edited

Loading

elasticmachine commented Nov 11, 2019

original-brownbear commented Nov 11, 2019

original-brownbear commented Nov 12, 2019

original-brownbear commented Nov 12, 2019

original-brownbear Nov 12, 2019

original-brownbear Nov 12, 2019

tlrx Nov 13, 2019

original-brownbear commented Nov 12, 2019

ywelsch left a comment

ywelsch Nov 13, 2019

original-brownbear Nov 13, 2019

ywelsch Nov 14, 2019

original-brownbear Nov 14, 2019 •

edited

Loading

original-brownbear left a comment

original-brownbear Nov 13, 2019

tlrx Nov 13, 2019

original-brownbear Nov 13, 2019

tlrx Nov 13, 2019

ywelsch left a comment

original-brownbear commented Nov 14, 2019

original-brownbear commented Nov 14, 2019

Track Repository Gen. in BlobStoreRepository #48944

Track Repository Gen. in BlobStoreRepository #48944

Conversation

original-brownbear commented Nov 11, 2019 • edited Loading

elasticmachine commented Nov 11, 2019

original-brownbear commented Nov 11, 2019

original-brownbear commented Nov 12, 2019

original-brownbear commented Nov 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Nov 12, 2019

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear Nov 14, 2019 • edited Loading

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented Nov 14, 2019

original-brownbear commented Nov 14, 2019

original-brownbear commented Nov 11, 2019 •

edited

Loading

original-brownbear Nov 14, 2019 •

edited

Loading