Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractSnapshotRepoTestKitRestTestCase#testRepositoryAnalysis fails with "Unable to verify integrity of data upload" #72358

Closed
DaveCTurner opened this issue Apr 28, 2021 · 1 comment · Fixed by #72378
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

Build scan:

Repro line: e.g. REPRODUCE WITH: ./gradlew ':x-pack:plugin:snapshot-repo-test-kit:qa:s3:integTest' --tests "org.elasticsearch.repositories.blobstore.testkit.S3SnapshotRepoTestKitIT.testRepositoryAnalysis" -Dtests.seed=987C4B3C71034593 -Dtests.locale=fr-CA -Dtests.timezone=Atlantic/Faeroe -Druntime.java=11

Reproduces locally?: No

Applicable branches: 7.12, master (probably 7.13 and 7.x too)

Failure history: https://build-stats.elastic.co/goto/ec1e3eb777247512d6089a2fe118056c

Failure excerpt:

org.elasticsearch.repositories.blobstore.testkit.S3SnapshotRepoTestKitIT > testRepositoryAnalysis FAILED
            org.elasticsearch.client.ResponseException: method [POST], host [http://127.0.0.1:36341], URI [/_snapshot/repository/_analyze?blob_count=10&max_blob_size=1mb&timeout=120s&concurrency=4], status line [HTTP/1.1 500 Internal Server Error]
{
  "error": {
    "root_cause": [
      {
        "type": "uncategorized_execution_exception",
        "reason": "Failed execution"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[repository] analysis failed, you may need to manually remove [temp-analysis-DRAYv4byTmeQ9WrF8qKuzQ]",
    "caused_by": {
      "type": "repository_verification_exception",
      "reason": "[repository] failure processing [blob analysis [repository:temp-analysis-DRAYv4byTmeQ9WrF8qKuzQ/test-blob-7-6Sn4wUgZQXuaBegKSwPCjA, length=262144, seed=-3175868635373160859, readEarly=false, writeAndOverwrite=false]]",
      "caused_by": {
        "type": "uncategorized_execution_exception",
        "reason": "Failed execution",
        "caused_by": {
          "type": "execution_exception",
          "reason": "java.io.IOException: Unable to upload object [base_path/temp-analysis-DRAYv4byTmeQ9WrF8qKuzQ/test-blob-7-6Sn4wUgZQXuaBegKSwPCjA] using a single upload",
          "caused_by": {
            "type": "i_o_exception",
            "reason": "Unable to upload object [base_path/temp-analysis-DRAYv4byTmeQ9WrF8qKuzQ/test-blob-7-6Sn4wUgZQXuaBegKSwPCjA] using a single upload",
            "caused_by": {
              "type": "sdk_client_exception",
              "reason": "Unable to verify integrity of data upload. Client calculated content hash (contentMD5: 4J5vz4CfLgJcHZYO3qGtsw== in base 64) didn't match hash (etag: 93131e5ff888533a740cc7c0913276eb in hex) calculated by Amazon S3.  You may need to delete the data stored in Amazon S3. (metadata.conte
ntMD5: null, md5DigestStream: com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream@75d4ac01, bucketName: bucket, key: base_path/temp-analysis-DRAYv4byTmeQ9WrF8qKuzQ/test-blob-7-6Sn4wUgZQXuaBegKSwPCjA)"
            }
          }
        }
      }
    }
  },
  "status": 500
}

There is a mismatch between the MD5 checksum computed by the client and by the server, which in these cases seems to be our own internal S3Fixture. The server-side checksum computation is a little tricky because of all the different ways the data might be encoded and chunked -- it contains no obvious bugs but nor does it obviously contain no bugs.

I opened #72321 to try and improve the reproducibility of these failures.

@DaveCTurner DaveCTurner added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Apr 28, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner self-assigned this Apr 28, 2021
DaveCTurner added a commit that referenced this issue Apr 28, 2021
Today we do not specify the `?seed=` parameter when running the
repository analyzer in REST tests, so we cannot reproduce the set
of operations that led to a failure. This commit introduces a
deterministic value for this parameter.

Relates #72358 which seems to indicate some kind of bug in how certain
checksums are calculated in the test fixtures.
DaveCTurner added a commit that referenced this issue Apr 28, 2021
Today we do not specify the `?seed=` parameter when running the
repository analyzer in REST tests, so we cannot reproduce the set
of operations that led to a failure. This commit introduces a
deterministic value for this parameter.

Relates #72358 which seems to indicate some kind of bug in how certain
checksums are calculated in the test fixtures.
DaveCTurner added a commit that referenced this issue Apr 28, 2021
Today we do not specify the `?seed=` parameter when running the
repository analyzer in REST tests, so we cannot reproduce the set
of operations that led to a failure. This commit introduces a
deterministic value for this parameter.

Relates #72358 which seems to indicate some kind of bug in how certain
checksums are calculated in the test fixtures.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Apr 28, 2021
The `S3HttpHandler` reads the contents of the uploaded blob, but if the
upload used chunked encoding then the reader would skip one or more
`\r\n` sequences if they appeared at the start of a chunk.

This commit reworks the reader to be stricter about its interpretation
of chunks, and removes some indirection via streams since we can work
pretty much entirely on the underlying `BytesReference` instead.

Closes elastic#72358
DaveCTurner added a commit that referenced this issue Apr 28, 2021
The `S3HttpHandler` reads the contents of the uploaded blob, but if the
upload used chunked encoding then the reader would skip one or more
`\r\n` sequences if they appeared at the start of a chunk.

This commit reworks the reader to be stricter about its interpretation
of chunks, and removes some indirection via streams since we can work
pretty much entirely on the underlying `BytesReference` instead.

Closes #72358
DaveCTurner added a commit that referenced this issue Apr 28, 2021
The `S3HttpHandler` reads the contents of the uploaded blob, but if the
upload used chunked encoding then the reader would skip one or more
`\r\n` sequences if they appeared at the start of a chunk.

This commit reworks the reader to be stricter about its interpretation
of chunks, and removes some indirection via streams since we can work
pretty much entirely on the underlying `BytesReference` instead.

Closes #72358
DaveCTurner added a commit that referenced this issue Apr 28, 2021
The `S3HttpHandler` reads the contents of the uploaded blob, but if the
upload used chunked encoding then the reader would skip one or more
`\r\n` sequences if they appeared at the start of a chunk.

This commit reworks the reader to be stricter about its interpretation
of chunks, and removes some indirection via streams since we can work
pretty much entirely on the underlying `BytesReference` instead.

Closes #72358
DaveCTurner added a commit that referenced this issue Apr 28, 2021
The `S3HttpHandler` reads the contents of the uploaded blob, but if the
upload used chunked encoding then the reader would skip one or more
`\r\n` sequences if they appeared at the start of a chunk.

This commit reworks the reader to be stricter about its interpretation
of chunks, and removes some indirection via streams since we can work
pretty much entirely on the underlying `BytesReference` instead.

Closes #72358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants