-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] AbstractSnapshotRepoTestKitRestTestCase testRepositoryAnalysis failures #72229
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
In elastic#72229 a test run was observed to exceed this 5-second timeout. This commit increases it to 20 seconds.
Yes there's actually quite a number of different failures here. Going through them one-by-one: https://gradle-enterprise.elastic.co/s/gervfulv6lrku
I'm surprised that we exceeded the 5-second timeout here but maybe it's possible if this CI worker was running quite slowly. It completed shortly after the timeout, but I don't see an obvious deadlock that could otherwise explain this, so I opened #72314 to just give this a more generous timeout. https://gradle-enterprise.elastic.co/s/m2kldzfhmictm
These two were just processing cluster state updates excruciatingly slowly, even before the test started running.:
We require the repo registration to complete in <30s but it didn't because of other cluster state updates still not having completed. Not sure what to do about this, there's definitely something wrong with how we're running these clusters if they take ≥3s for each cluster state update. TBD. https://gradle-enterprise.elastic.co/s/u56mg6brfudq6 This one fails because the MD5 checksum we compute on upload doesn't match the checksum the server computes. I opened #72358 to track this separately. |
In #72229 a test run was observed to exceed this 5-second timeout. This commit increases it to 20 seconds.
In #72229 a test run was observed to exceed this 5-second timeout. This commit increases it to 20 seconds.
In #72229 a test run was observed to exceed this 5-second timeout. This commit increases it to 20 seconds.
Today we do not specify the `?seed=` parameter when running the repository analyzer in REST tests, so we cannot reproduce the set of operations that led to a failure. This commit introduces a deterministic value for this parameter. Relates elastic#72229 which seems to indicate some kind of bug in how certain checksums are calculated in the test fixtures.
I opened #72358 specifically for the checksum failures because they're definitely unrelated to the other failures. |
I opened #72404 to discuss the cases where cluster state updates were just desperately slow, as I think this needs help from the delivery folks. Since these failures are all either addressed or tracked elsewhere, I am closing this. |
Build scan:
https://gradle-enterprise.elastic.co/s/gervfulv6lrku
https://gradle-enterprise.elastic.co/s/o4qzst6pglg36
https://gradle-enterprise.elastic.co/s/u56mg6brfudq6
https://gradle-enterprise.elastic.co/s/m2kldzfhmictm
Repro line:
Reproduces locally?:
NO
Applicable branches:
master, 7.x, 7.13
Failure history:
Regularly although I cannot say the cause is the same for all of these
https://build-stats.elastic.co/goto/5c313f0165b5e7db32c26bec167ef280
Failure excerpt:
The actual tests and errors are vary:
https://gradle-enterprise.elastic.co/s/gervfulv6lrku/tests/:x-pack:plugin:snapshot-repo-test-kit:internalClusterTest/org.elasticsearch.repositories.blobstore.testkit.RepositoryAnalysisSuccessIT/testRepositoryAnalysis?expanded-stacktrace=WyIwLTEiXQ
https://gradle-enterprise.elastic.co/s/o4qzst6pglg36/tests/:x-pack:plugin:snapshot-repo-test-kit:qa:s3:integTest/org.elasticsearch.repositories.blobstore.testkit.S3SnapshotRepoTestKitIT/testRepositoryAnalysis?top-execution=1
https://gradle-enterprise.elastic.co/s/u56mg6brfudq6/tests/:x-pack:plugin:snapshot-repo-test-kit:qa:s3:integTest/org.elasticsearch.repositories.blobstore.testkit.S3SnapshotRepoTestKitIT/testRepositoryAnalysis?top-execution=1
The text was updated successfully, but these errors were encountered: