Handle Concurrent Repo Modification to Fix Test #48433

original-brownbear · 2019-10-23T23:00:16Z

Just like #48329 (and using the changes) in that PR
we can run into a concurrent repo modification that we
will throw on and must retry until consistent handling of
this situation is implemented.

Closes #47834

Just like elastic#48329 (and using the changes) in that PR we can run into a concurrent repo modification that we will throw on and must retry until consistent handling of this situation is implemented. Closes elastic#47384

elasticmachine · 2019-10-23T23:00:17Z

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

gwbrown · 2019-10-23T23:07:48Z

x-pack/plugin/ilm/src/test/java/org/elasticsearch/xpack/slm/SLMSnapshotBlockingIntegTests.java

@@ -215,6 +215,10 @@ public void testRetentionWhileSnapshotInProgress() throws Exception {
                    SnapshotsStatusResponse s =
                        client().admin().cluster().prepareSnapshotStatus(REPO).setSnapshots(completedSnapshotName).get();
                    assertNull("expected no snapshot but one was returned", s.getSnapshots().get(0));
+                } catch (RepositoryException e) {
+                    // Concurrent status calls and write operations may lead to failures in determining the current repository generation
+                    // TODO: Remove this hack once tracking the current repository generation has been made consistent


Is there an issue we could link to here so that we have something we can reference to see if it's safe to remove the hack?

Not yet I'm afraid. We have #38941 but the fix to the corruption issue may not fully resolve this situation yet.

Not even sure we have to solve this one with any kind of priority outside of tests:

What's basically happening here is that the snapshot status API breaks for a tiny window during snapshot delete and create (in practice that window will be a little more than the latency of one API request so it's really really hard to actually run into it and you'll probably run into other IO issues more often than this ... but as it turns out SLM tests are the only tests running these APIs in hot loops and are shaking these kinds of issues out). Maybe this is even an ok long term solution? (I'll gather some feedback on that tomorrow and will create an issue accordingly :))

We discussed this today during snapshot resiliency sync and I'll code up a fix for this shortly and will remove the todos :)

tlrx · 2019-10-24T10:31:45Z

Closes #47384

Is that really the issue you want to close?

original-brownbear · 2019-10-24T11:11:16Z

@tlrx thanks for taking a look and spotting that! Obviously mixed up the last two numbers there and want to close #47834 instead :)

gwbrown · 2019-10-24T18:46:52Z

@original-brownbear This test was muted in #48441, when you do the backports for this fix, can you unmute the test in other branches if necessary? (I'll take care of unmuting in master).

ywelsch · 2020-02-27T09:02:06Z

Was this ever backported?

original-brownbear · 2020-02-27T09:12:07Z

Yea my bad for not linking things properly this change was pulled into 7.6 by Gordon in 5021410

Handle Concurrent Repo Modification to Fix Test

036084c

Just like elastic#48329 (and using the changes) in that PR we can run into a concurrent repo modification that we will throw on and must retry until consistent handling of this situation is implemented. Closes elastic#47384

original-brownbear added >test Issues or PRs that are addressing/adding tests :Data Management/ILM+SLM Index and Snapshot lifecycle management v8.0.0 v7.6.0 labels Oct 23, 2019

original-brownbear requested review from dakrone, tlrx and gwbrown October 23, 2019 23:00

gwbrown reviewed Oct 23, 2019

View reviewed changes

tlrx approved these changes Oct 24, 2019

View reviewed changes

original-brownbear merged commit 42f66ac into elastic:master Oct 24, 2019

original-brownbear deleted the 47834-take-2 branch October 24, 2019 11:11

original-brownbear added the backport pending label Oct 24, 2019

gwbrown mentioned this pull request Oct 24, 2019

[CI] org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing in master #48441

Closed

This was referenced Oct 24, 2019

Unmute testRetentionWhileSnapshotInProgress #48487

Closed

[CI] Failure in ILMDocumentationIT.testAddSnapshotLifecyclePolicy #46021

Closed

original-brownbear removed the backport pending label Feb 27, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Concurrent Repo Modification to Fix Test #48433

Handle Concurrent Repo Modification to Fix Test #48433

original-brownbear commented Oct 23, 2019 •

edited

Loading

elasticmachine commented Oct 23, 2019

gwbrown Oct 23, 2019

original-brownbear Oct 23, 2019 •

edited

Loading

original-brownbear Oct 24, 2019

tlrx commented Oct 24, 2019

original-brownbear commented Oct 24, 2019

gwbrown commented Oct 24, 2019 •

edited

Loading

ywelsch commented Feb 27, 2020

original-brownbear commented Feb 27, 2020

Handle Concurrent Repo Modification to Fix Test #48433

Handle Concurrent Repo Modification to Fix Test #48433

Conversation

original-brownbear commented Oct 23, 2019 • edited Loading

elasticmachine commented Oct 23, 2019

gwbrown Oct 23, 2019

Choose a reason for hiding this comment

original-brownbear Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

original-brownbear Oct 24, 2019

Choose a reason for hiding this comment

tlrx commented Oct 24, 2019

original-brownbear commented Oct 24, 2019

gwbrown commented Oct 24, 2019 • edited Loading

ywelsch commented Feb 27, 2020

original-brownbear commented Feb 27, 2020

original-brownbear commented Oct 23, 2019 •

edited

Loading

original-brownbear Oct 23, 2019 •

edited

Loading

gwbrown commented Oct 24, 2019 •

edited

Loading