[CI] SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart Fails #38489

original-brownbear · 2019-02-06T06:29:23Z

Happened here:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/606/

  2> REPRODUCE WITH: ./gradlew :server:integTest -Dtests.seed=D7E1CF66C40F4360 -Dtests.class=org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT -Dtests.method="testAbortedSnapshotDuringInitDoesNotStart" -Dtests.security.manager=true -Dtests.locale=es-CL -Dtests.timezone=America/Punta_Arenas -Dcompiler.java=11 -Druntime.java=8

fails with:

ERROR   0.87s J1 | SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart <<< FAILURES!
   > Throwable #1: UncategorizedExecutionException[Failed execution]; nested: ExecutionException[java.nio.file.NoSuchFileException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+internalClusterTest/server/build/testrun/integTest/J1/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_D7E1CF66C40F4360-001/tempDir-002/repos/dJWCuVKIEy/indices/KbMMazL8QxmxzkgdNZRFeQ/meta-hJC9954EReKInUiOrlhLfg.dat]; nested: NoSuchFileException[/var/lib/jenkins/workspace/elastic+elasticsearch+master+internalClusterTest/server/build/testrun/integTest/J1/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_D7E1CF66C40F4360-001/tempDir-002/repos/dJWCuVKIEy/indices/KbMMazL8QxmxzkgdNZRFeQ/meta-hJC9954EReKInUiOrlhLfg.dat];
   > 	at __randomizedtesting.SeedInfo.seed([D7E1CF66C40F4360:AF3C0690F8AC786E]:0)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:101)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:62)
   > 	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:34)
   > 	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:52)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.lambda$testAbortedSnapshotDuringInitDoesNotStart$24(SharedClusterSnapshotRestoreIT.java:3687)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:846)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:832)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart(SharedClusterSnapshotRestoreIT.java:3685)
   > 	at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.util.concurrent.ExecutionException: java.nio.file.NoSuchFileException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+internalClusterTest/server/build/testrun/integTest/J1/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_D7E1CF66C40F4360-001/tempDir-002/repos/dJWCuVKIEy/indices/KbMMazL8QxmxzkgdNZRFeQ/meta-hJC9954EReKInUiOrlhLfg.dat
   > 	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:266)
   > 	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:253)
   > 	at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:87)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:57)
   > 	... 43 more

relates #38368 , #38226

Seems like it's the same failure that happened was supposed to be fixed in #38368 but at a much lower rate.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-02-06T06:29:24Z

Pinging @elastic/es-distributed

original-brownbear · 2019-02-06T08:45:42Z

Seems like a problem with the mock repository block.

If I put a sleep before the busy assert:

            TimeUnit.SECONDS.sleep(2L);
            // The deletion must set the snapshot in the ABORTED state
            assertBusy(() -> {
                SnapshotsStatusResponse status =
                    client.admin().cluster().prepareSnapshotStatus("repository").setSnapshots("snap").get();
                assertThat(status.getSnapshots().iterator().next().getState(), equalTo(State.ABORTED));
            });

it fails every time, seems the snapshot is deleted already even without unblocking the master node in the next lines:

// Now unblock the repository
            unblockNode("repository", internalCluster().getMasterName());
            blocked = false;

            assertAcked(delete.get());
            expectThrows(SnapshotMissingException.class, () ->
                client.admin().cluster().prepareGetSnapshots("repository").setSnapshots("snap").get());

* Closes elastic#38489

cbuescher · 2019-02-08T12:17:38Z

Failed again to day on 7.0: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+internalClusterTest/139/console

* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes #38489

talevy · 2019-02-13T22:03:03Z

@original-brownbear do we want to port this over to 7.0? looks like it is continuing to fail in 7.0 CI

original-brownbear · 2019-02-13T22:05:03Z

@talevy sorry didn't get to backporting this yet, will do shortly.

* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes elastic#38489

* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes #38489

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.0.0 labels Feb 6, 2019

original-brownbear self-assigned this Feb 6, 2019

jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 6, 2019

Fix Issue with Concurrent Snapshot Init + Delete

6387014

* Closes elastic#38489

original-brownbear mentioned this issue Feb 6, 2019

Fix Issue with Concurrent Snapshot Init + Delete #38518

Merged

Tim-Brooks mentioned this issue Feb 7, 2019

Add 7.1 version constant to 7.x branch #38513

Merged

Tim-Brooks added the v7.2.0 label Feb 7, 2019

original-brownbear closed this as completed in #38518 Feb 8, 2019

talevy mentioned this issue Feb 15, 2019

Fix Issue with Concurrent Snapshot Init + Delete (#38518) #38969

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart Fails #38489

[CI] SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart Fails #38489

original-brownbear commented Feb 6, 2019 •

edited

Loading

elasticmachine commented Feb 6, 2019

original-brownbear commented Feb 6, 2019 •

edited

Loading

cbuescher commented Feb 8, 2019

talevy commented Feb 13, 2019

original-brownbear commented Feb 13, 2019

[CI] SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart Fails #38489

[CI] SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart Fails #38489

Comments

original-brownbear commented Feb 6, 2019 • edited Loading

elasticmachine commented Feb 6, 2019

original-brownbear commented Feb 6, 2019 • edited Loading

cbuescher commented Feb 8, 2019

talevy commented Feb 13, 2019

original-brownbear commented Feb 13, 2019

original-brownbear commented Feb 6, 2019 •

edited

Loading

original-brownbear commented Feb 6, 2019 •

edited

Loading