Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] RepositoryS3ClientYamlTestSuiteIT.test failed to restore snapshot #41882

Closed
matriv opened this issue May 7, 2019 · 3 comments · Fixed by #41898
Closed

[CI] RepositoryS3ClientYamlTestSuiteIT.test failed to restore snapshot #41882

matriv opened this issue May 7, 2019 · 3 comments · Fixed by #41898
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@matriv
Copy link
Contributor

matriv commented May 7, 2019

Failed for master: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+third-party-tests-s3/345/console

Reproduction line

(does not) reproduce locally

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestRunner --tests "org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT.test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials}" \
  -Dtests.seed=9663B0FDBBFB410C \
  -Dtests.security.manager=true \
  -Dtests.locale=es-419 \
  -Dtests.timezone=Europe/Berlin \
  -Dcompiler.java=12 \
  -Druntime.java=11 \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*,repository_s3/50_repository_ecs_credentials/*

Example relevant log:

10:48:30 Suite: Test class org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT
10:48:30   1> [2019-05-07T10:47:55,406][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/20_repository_permanent_credentials/teardown] before test
10:48:30   1> [2019-05-07T10:47:55,561][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] initializing REST clients against [http://[::1]:33993, http://127.0.0.1:33472]
10:48:30   1> [2019-05-07T10:47:56,079][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] initializing client, minimum es version [8.0.0], master version, [8.0.0], hosts [http://[::1]:33993, http://127.0.0.1:33472]
10:48:30   1> [2019-05-07T10:47:58,677][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/20_repository_permanent_credentials/teardown] after test
10:48:30   1> [2019-05-07T10:47:58,710][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/40_repository_ec2_credentials/teardown] before test
10:48:30   1> [2019-05-07T10:47:58,745][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/40_repository_ec2_credentials/teardown] after test
10:48:30   1> [2019-05-07T10:47:58,764][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/50_repository_ecs_credentials/Register a repository with a non existing client] before test
10:48:30   1> [2019-05-07T10:47:58,793][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/50_repository_ecs_credentials/Register a repository with a non existing client] after test
10:48:30   1> [2019-05-07T10:47:58,809][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/30_repository_temporary_credentials/Register a repository with a non existing bucket] before test
10:48:30   1> [2019-05-07T10:47:58,839][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/30_repository_temporary_credentials/Register a repository with a non existing bucket] after test
10:48:30   1> [2019-05-07T10:47:58,856][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/40_repository_ec2_credentials/Register a read-only repository with a non existing client] before test
10:48:30   1> [2019-05-07T10:47:58,883][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/40_repository_ec2_credentials/Register a read-only repository with a non existing client] after test
10:48:30   1> [2019-05-07T10:47:58,897][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] [yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials] before test
10:48:30   1> [2019-05-07T10:48:01,168][WARN ][o.e.c.RestClient         ] [test] request [PUT http://[::1]:33993/_bulk?refresh=true&error_trace=true] returned 1 warnings: [299 Elasticsearch-8.0.0-SNAPSHOT-8d01747 "[types removal] Specifying types in bulk requests is deprecated."]
10:48:30   1> [2019-05-07T10:48:07,517][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test] Stash dump on test failure [{
10:48:30   1>   "stash" : {
10:48:30   1>     "body" : {
10:48:30   1>       "error" : {
10:48:30   1>         "root_cause" : [
10:48:30   1>           {
10:48:30   1>             "type" : "snapshot_missing_exception",
10:48:30   1>             "reason" : "[repository_permanent:snapshot-two/pYvfikZ7QwavHHyIACcXMA] is missing",
10:48:30   1>             "stack_trace" : "SnapshotMissingException[[repository_permanent:snapshot-two/pYvfikZ7QwavHHyIACcXMA] is missing]; nested: NoSuchFileException[Blob object [snap-pYvfikZ7QwavHHyIACcXMA.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 813EA991EA6571BE; S3 Extended Request ID: 4jLVEk5kpDsSQXmNCsuiw26MzMKOj4VdbsQwX1ginUXk/4mklCkzwBtdcxiTYRbkKHHOliCOX7s=)];
10:48:30   1> 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:553)
10:48:30   1> 	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:207)
10:48:30   1> 	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135)
10:48:30   1> 	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54)
10:48:30   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127)
10:48:30   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.doRun(TransportMasterNodeAction.java:200)
10:48:30   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757)
10:48:30   1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
10:48:30   1> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
10:48:30   1> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
10:48:30   1> 	at java.base/java.lang.Thread.run(Thread.java:834)
10:48:30   1> Caused by: java.nio.file.NoSuchFileException: Blob object [snap-pYvfikZ7QwavHHyIACcXMA.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 813EA991EA6571BE; S3 Extended Request ID: 4jLVEk5kpDsSQXmNCsuiw26MzMKOj4VdbsQwX1ginUXk/4mklCkzwBtdcxiTYRbkKHHOliCOX7s=)
10:48:30   1> 	at org.elasticsearch.repositories.s3.S3BlobContainer.readBlob(S3BlobContainer.java:98)
10:48:30   1> 	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101)
10:48:30   1> 	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90)
10:48:30   1> 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:551)
10:48:30   1> 	... 10 more
10:48:30   [2019-05-07T08:48:22,239][WARN ][o.e.s.RestoreService     ] [integTest-1] [repository_permanent:missing] failed to restore snapshot
10:48:30   org.elasticsearch.snapshots.SnapshotRestoreException: [repository_permanent:missing] snapshot does not exist
10:48:30   	at org.elasticsearch.snapshots.RestoreService.restoreSnapshot(RestoreService.java:190) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at org.elasticsearch.action.admin.cluster.snapshots.restore.TransportRestoreSnapshotAction.masterOperation(TransportRestoreSnapshotAction.java:77) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at org.elasticsearch.action.admin.cluster.snapshots.restore.TransportRestoreSnapshotAction.masterOperation(TransportRestoreSnapshotAction.java:38) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.doRun(TransportMasterNodeAction.java:200) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
10:48:30   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
10:48:30   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
10:48:30   	at java.lang.Thread.run(Thread.java:834) [?:?]

Frequency

1st failure

@matriv matriv added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels May 7, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member

Just for the record: I talked to infra about this one an they don't know of anything special this morning that could've caused this (this is a 3rd party S3 test).
Rerunning the test made it green again.

=> I'm investigating where this could've come from still, doesn't look like it was an outside issue with the S3 bucket.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue May 7, 2019
* First off, this exists check is somewhat pointless in that it checks
for the existence of a blob that contains a UUID in its name so we don't
expect collisions here ever. Just checking for no name collision is
completely sufficient.
* More importantly though, this check introduces an issue on S3 because
we will run it against a non existing snap-{uuid}.dat blob!
  * This leads to the fact that subsequent reads of this blob after it
was written can still return a 404 because AWS S3 only guarantees first
read after write consistency. If a read of a key is made before a write
is made this guarantee goes out the window. (see
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
for more)
* Closes elastic#41882
@original-brownbear
Copy link
Member

This is an actual bug in our code that together with S3's eventual consistency caused trouble. Fix in #41898

original-brownbear added a commit that referenced this issue May 7, 2019
* First off, this exists check is somewhat pointless in that it checks
for the existence of a blob that contains a UUID in its name so we don't
expect collisions here ever. Just checking for no name collision is
completely sufficient.
   * Since this check was the only user of the blob store format method I removed that now dead method
* More importantly though, this check introduces an issue on S3 because
we will run it against a non existing snap-{uuid}.dat blob
  * This leads to the fact that subsequent reads of this blob after it
was written can still return a 404 because AWS S3 only guarantees first
read after write consistency. If a read of a key is made before a write
is made this guarantee goes out the window. (see
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
for more)
* Closes #41882
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
* First off, this exists check is somewhat pointless in that it checks
for the existence of a blob that contains a UUID in its name so we don't
expect collisions here ever. Just checking for no name collision is
completely sufficient.
   * Since this check was the only user of the blob store format method I removed that now dead method
* More importantly though, this check introduces an issue on S3 because
we will run it against a non existing snap-{uuid}.dat blob
  * This leads to the fact that subsequent reads of this blob after it
was written can still return a 404 because AWS S3 only guarantees first
read after write consistency. If a read of a key is made before a write
is made this guarantee goes out the window. (see
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
for more)
* Closes elastic#41882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants