-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve reaction to blob store corruptions #111954
Improve reaction to blob store corruptions #111954
Conversation
Today there are a couple of assertions that can trip if the contents of a snapshot repostiory are corrupted. It makes sense to assert the integrity of snapshots in most tests, but we must also (a) protect against these corruptions in production and (b) allow some tests to verify the behaviour of the system when the repository is corrupted. This commit introduces a flag to disable certain assertions, converts the relevant assertions into production failures too, and introduces a high-level test to verify that we do detect all relevant corruptions without tripping any other assertions. Extracted from elastic#93735 as this change makes sense in its own right. Relates elastic#52622.
Pinging @elastic/es-distributed (Team:Distributed) |
|
||
try (var ignored = new BlobStoreIndexShardSnapshotsIntegritySuppressor()) { | ||
final var exception = safeAwait(randomFrom(corruptionDetectors)); | ||
logger.info(Strings.format("--> corrupted [%s] and caught exception", corruptedFile), exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this potentially be considered an example of use randomised testing for coverage?
It seems we've enumerated the ways in which we expect to detect each type of corruption, then we're randomly picking one to execute. Would it not be better to test them all? or does the execution of one have side-effects precluding the others running?
More a question than a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh kinda, tho the coverage is extremely low even if we did check all these paths on each test run. The assertions we were previously hitting would only trip if you corrupted a very specific byte in exactly the right way.
Really this is just to verify that a corruption doesn't trip assertions so that we can proceed with #93735 that will introduce a way to catch all these problems at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* upstream/main: Fail `indexDocs()` on rejection (elastic#111962) Move repo analyzer to its own package (elastic#111963) Add generated evaluators for DateNanos conversion functions (elastic#111961) Clean the last traces from global retention in templates (elastic#111669) Fix known issue docs for elastic#111866 (elastic#111956) x-pack/plugin/otel: introduce x-pack-otel plugin (elastic#111091) Improve reaction to blob store corruptions (elastic#111954) Introduce `StreamingXContentResponse` (elastic#111933) Revert "Add 8.15.0 known issue for memory locking in Windows (elastic#111949)" Test get-snapshots API with missing details (elastic#111903) Add 8.15.0 known issue for memory locking in Windows (elastic#111949) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
Makes these utility methods available to other test suites (to be added in future PRs). Relates elastic#111954
Today there are a couple of assertions that can trip if the contents of a snapshot repostiory are corrupted. It makes sense to assert the integrity of snapshots in most tests, but we must also (a) protect against these corruptions in production and (b) allow some tests to verify the behaviour of the system when the repository is corrupted. This commit introduces a flag to disable certain assertions, converts the relevant assertions into production failures too, and introduces a high-level test to verify that we do detect all relevant corruptions without tripping any other assertions. Extracted from elastic#93735 as this change makes sense in its own right. Relates elastic#52622.
Makes these utility methods available to other test suites (to be added in future PRs). Relates elastic#111954
Today there are a couple of assertions that can trip if the contents of a snapshot repostiory are corrupted. It makes sense to assert the integrity of snapshots in most tests, but we must also (a) protect against these corruptions in production and (b) allow some tests to verify the behaviour of the system when the repository is corrupted. This commit introduces a flag to disable certain assertions, converts the relevant assertions into production failures too, and introduces a high-level test to verify that we do detect all relevant corruptions without tripping any other assertions. Extracted from elastic#93735 as this change makes sense in its own right. Relates elastic#52622.
Today there are a couple of assertions that can trip if the contents of
a snapshot repostiory are corrupted. It makes sense to assert the
integrity of snapshots in most tests, but we must also (a) protect
against these corruptions in production and (b) allow some tests to
verify the behaviour of the system when the repository is corrupted.
This commit introduces a flag to disable certain assertions, converts
the relevant assertions into production failures too, and introduces a
high-level test to verify that we do detect all relevant corruptions
without tripping any other assertions.
Extracted from #93735 as this change makes sense in its own right.
Relates #52622.