KAFKA-14535: Fix flaky EndToEndAuthorization tests which were sensitive to ACL change reordering#13086
Merged
omkreddy merged 1 commit intoapache:trunkfrom Jan 7, 2023
Conversation
…ve to ACL change reordering The ACL change methods (create, delete) are eventually consistent across a Kafka cluster. As part of that, changes to the same resource made to different brokers may be reordered. In this test, a delete operation initiated in noConsumeWithoutDescribeAclSetup was being reordered to an unexpected time later in the test, causing an expected ACL to be missing. To fix this, add a wait condition that asserts that the delete operations initiated in noConsumeWithoutDescribeAclSetup are completely applied before returning from the method. Note: test failures were present for both the ZK and KRAFT iterations of the test, but the ZK iteration encountered it much more often, and was used to diagnose the reordering. Signed-off-by: Greg Harris <greg.harris@aiven.io>
Contributor
Author
|
The last people working in this area were @pprovenzano @omkreddy and @soarez, if any of you have some time i'd appreciate a review on this test which flakes ~10-20% of the time. Thanks! |
omkreddy
approved these changes
Jan 7, 2023
Contributor
omkreddy
left a comment
There was a problem hiding this comment.
Thanks for the PR. LGTM
Contributor
|
Test failures are unrelated, merging the PR |
rajinisivaram
added a commit
to confluentinc/kafka
that referenced
this pull request
Jan 9, 2023
…9-jan-2023 * apache/trunk: (16 commits) KAFKA-14570: Fix parenthesis in verifyFullFetchResponsePartitions output (apache#13072) MINOR: Remove public mutable fields from ProducerAppendInfo (apache#13091) KAFKA-14558: Move/Rewrite LastRecord, TxnMetadata, BatchMetadata, ProducerStateEntry, and ProducerAppendInfo to the storage module (apache#13043) KAFKA-14535: Fix flaky EndToEndAuthorization tests which were sensitive to ACL change reordering (apache#13086) KAFKA-9087 Replace log high watermark by future log high watermark wh… (apache#13075) MINOR: add error reason when controller failed to handle events (apache#13050) MINOR: doc: note how JDK-8136913 can affect client SASL (apache#13071) 2023 (apache#13083) KAFKA-14279; Add 3.3.x to core compatibility tests (apache#13076) MINOR Fixed doc generation for LogConfig class in genTopicConfigDocs. (apache#13079) ...
guozhangwang
pushed a commit
to guozhangwang/kafka
that referenced
this pull request
Jan 25, 2023
…ve to ACL change reordering (apache#13086) The ACL change methods (create, delete) are eventually consistent across a Kafka cluster. As part of that, changes to the same resource made to different brokers may be reordered. In this test, a delete operation initiated in noConsumeWithoutDescribeAclSetup was being reordered to an unexpected time later in the test, causing an expected ACL to be missing. To fix this, add a wait condition that asserts that the delete operations initiated in noConsumeWithoutDescribeAclSetup are completely applied before returning from the method. Note: test failures were present for both the ZK and KRAFT iterations of the test, but the ZK iteration encountered it much more often, and was used to diagnose the reordering. Signed-off-by: Greg Harris <greg.harris@aiven.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The ACL change methods (create, delete) are eventually consistent across a Kafka cluster. As part of that, changes to the same resource made to different brokers may be reordered. In this test, a delete operation initiated in noConsumeWithoutDescribeAclSetup was being reordered to an unexpected time later in the test, causing an expected ACL to be missing.
To fix this, add a wait condition that asserts that the delete operations initiated in noConsumeWithoutDescribeAclSetup are completely applied before returning from the method.
Note: test failures were present for both the ZK and KRAFT iterations of the test, but the ZK iteration encountered it much more often, and was used to diagnose the reordering.
This is a typo in the original implementation of this method in https://github.com/apache/kafka/pull/1908/files#diff-c38532e2fcf5cce4cf30a824bc6bef9953c05d9472db51c3778a0286f6f01296R311 . And was made worse when the CREATE acl was added but never deleted: https://github.com/apache/kafka/pull/4795/files#diff-c38532e2fcf5cce4cf30a824bc6bef9953c05d9472db51c3778a0286f6f01296R293-R308 .
However, I don't think that this became flakey until #12843 which replaced the AclCommand with the AdminClient, and removed the synchronization to wait for the ACL operation to resolve before proceeding with the test.
This does not require a change to the ACL mechanisms, as I believe the reordering is an expected part of the consistency algorithm, which includes a CAS operation to ZK. This is also not a security risk, as this only affects successful ACL change operations, and should not allow unsuccessful ACL change operations to insert invalid ACLs.
Signed-off-by: Greg Harris greg.harris@aiven.io
Committer Checklist (excluded from commit message)