Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart Fails #38489

Closed
original-brownbear opened this issue Feb 6, 2019 · 5 comments · Fixed by #38518
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.2.0 v8.0.0-alpha1

Comments

@original-brownbear
Copy link
Member

original-brownbear commented Feb 6, 2019

Happened here:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/606/

  2> REPRODUCE WITH: ./gradlew :server:integTest -Dtests.seed=D7E1CF66C40F4360 -Dtests.class=org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT -Dtests.method="testAbortedSnapshotDuringInitDoesNotStart" -Dtests.security.manager=true -Dtests.locale=es-CL -Dtests.timezone=America/Punta_Arenas -Dcompiler.java=11 -Druntime.java=8

fails with:

ERROR   0.87s J1 | SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart <<< FAILURES!
   > Throwable #1: UncategorizedExecutionException[Failed execution]; nested: ExecutionException[java.nio.file.NoSuchFileException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+internalClusterTest/server/build/testrun/integTest/J1/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_D7E1CF66C40F4360-001/tempDir-002/repos/dJWCuVKIEy/indices/KbMMazL8QxmxzkgdNZRFeQ/meta-hJC9954EReKInUiOrlhLfg.dat]; nested: NoSuchFileException[/var/lib/jenkins/workspace/elastic+elasticsearch+master+internalClusterTest/server/build/testrun/integTest/J1/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_D7E1CF66C40F4360-001/tempDir-002/repos/dJWCuVKIEy/indices/KbMMazL8QxmxzkgdNZRFeQ/meta-hJC9954EReKInUiOrlhLfg.dat];
   > 	at __randomizedtesting.SeedInfo.seed([D7E1CF66C40F4360:AF3C0690F8AC786E]:0)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:101)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:62)
   > 	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:34)
   > 	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:52)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.lambda$testAbortedSnapshotDuringInitDoesNotStart$24(SharedClusterSnapshotRestoreIT.java:3687)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:846)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:832)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart(SharedClusterSnapshotRestoreIT.java:3685)
   > 	at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.util.concurrent.ExecutionException: java.nio.file.NoSuchFileException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+internalClusterTest/server/build/testrun/integTest/J1/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_D7E1CF66C40F4360-001/tempDir-002/repos/dJWCuVKIEy/indices/KbMMazL8QxmxzkgdNZRFeQ/meta-hJC9954EReKInUiOrlhLfg.dat
   > 	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:266)
   > 	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:253)
   > 	at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:87)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:57)
   > 	... 43 more

relates #38368 , #38226


Seems like it's the same failure that happened was supposed to be fixed in #38368 but at a much lower rate.

@original-brownbear original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.0.0 labels Feb 6, 2019
@original-brownbear original-brownbear self-assigned this Feb 6, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member Author

original-brownbear commented Feb 6, 2019

Seems like a problem with the mock repository block.

If I put a sleep before the busy assert:

            TimeUnit.SECONDS.sleep(2L);
            // The deletion must set the snapshot in the ABORTED state
            assertBusy(() -> {
                SnapshotsStatusResponse status =
                    client.admin().cluster().prepareSnapshotStatus("repository").setSnapshots("snap").get();
                assertThat(status.getSnapshots().iterator().next().getState(), equalTo(State.ABORTED));
            });

it fails every time, seems the snapshot is deleted already even without unblocking the master node in the next lines:

// Now unblock the repository
            unblockNode("repository", internalCluster().getMasterName());
            blocked = false;

            assertAcked(delete.get());
            expectThrows(SnapshotMissingException.class, () ->
                client.admin().cluster().prepareGetSnapshots("repository").setSnapshots("snap").get());

@cbuescher
Copy link
Member

original-brownbear added a commit that referenced this issue Feb 8, 2019
* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread

* Closes #38489
@talevy
Copy link
Contributor

talevy commented Feb 13, 2019

@original-brownbear do we want to port this over to 7.0? looks like it is continuing to fail in 7.0 CI

@original-brownbear
Copy link
Member Author

@talevy sorry didn't get to backporting this yet, will do shortly.

talevy pushed a commit to talevy/elasticsearch that referenced this issue Feb 15, 2019
* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread

* Closes elastic#38489
talevy added a commit that referenced this issue Feb 16, 2019
* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread

* Closes #38489
talevy pushed a commit that referenced this issue Feb 16, 2019
* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread

* Closes #38489
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants