KAFKA-15119:Support incremental syncTopicAcls in MirrorSourceConnector#13913
KAFKA-15119:Support incremental syncTopicAcls in MirrorSourceConnector#13913hudeqi wants to merge 5 commits intoapache:trunkfrom
Conversation
|
If this improvement is reasonable, I will add related unit test. |
|
Hi! please help to review this PR when you are free, thank you! @C0urante |
connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java
Outdated
Show resolved
Hide resolved
connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java
Show resolved
Hide resolved
connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java
Outdated
Show resolved
Hide resolved
|
Hi, I have updated, which show in unresolved conversation. @C0urante |
|
pin again @C0urante |
connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java
Show resolved
Hide resolved
connect/mirror/src/test/java/org/apache/kafka/connect/mirror/MirrorSourceConnectorTest.java
Outdated
Show resolved
Hide resolved
connect/mirror/src/test/java/org/apache/kafka/connect/mirror/MirrorSourceConnectorTest.java
Outdated
Show resolved
Hide resolved
|
I have added the unit test for the related ”createAcl failure“ case, thanks for the review! @C0urante |
|
I would like to ask a question by the way: Why do we not synchronize the write permission of |
| } | ||
| })); | ||
| knownTopicAclBindings = new HashSet<>(bindings); | ||
| knownTopicAclBindings.removeAll(failedBindings); |
There was a problem hiding this comment.
Isn't failedBindings likely to be empty at this point unless the admin client is able to perform the request to create the ACL bindings exceptionally fast (which also happens to be the scenario we're covering in the unit test)?
C0urante
left a comment
There was a problem hiding this comment.
Hmmm... the more I think about it, the more hesitant I am to include this change.
There are some issues with the proposed implementation, but even if those are addressed (and I believe they can be), I don't know if it's safe at all to cache our view of the target cluster's ACLs. The current behavior is responsible not only for creating initial ACL bindings, but also for continually re-applying them if they are changed. I'm worried that we may break existing users' setups if we change this logic.
Users can also increase the value for the sync.topic.acls.interval.seconds property if the existing ACL syncing logic is placing too high a workload on their Kafka cluster.
Separately, to answer your question about total mirroring of ACL bindings, replicated topics were originally intended to be read-only for everything except MM2 (see the relevant section in KIP-382). There's a Jira ticket to add support for mirroring of more than just READ ACLs, but it's currently unassigned and will require a KIP to address.
|
Especially thanks for your separate reply! @C0urante I have tracked it in the Jira ticket. Going back to this PR, please correct me if I'm wrong, thanks! Although it is named incremental ACL synchronization, it does not change the existing behavior. The reason is: In addition, the "affect the accumulation of the request queue of the target cluster and further affect the processing delay of other type requests" I mentioned is not groundless. It is a serious problem found in production environment. It can be seen that full synchronization and incremental synchronization are important for target cluster producer latency impact: |
|
@hudeqi Sorry, I think there's a misunderstanding here. I'm not claiming that MM2 would be incapable of detecting changes in source cluster ACLs with this change; I'm worried that it would be unable to detect (or really, just overwrite) changes in target cluster ACLs, if they were made by a separate user/process from the MM2 cluster. |
Yes, the logic has changed before and after the change (reconfirm that there is no misunderstanding: the ACL of the target cluster is updated but the ACL of the source cluster is not updated. In this case, the ACL of the source cluster cannot be used to cover the target cluster). But look at this document, it seems that the goal of this ACL synchronization is only to synchronize the changes of the source cluster ACL? I don't know if I understand it right. |
|
The original MM2 KIPs use very few words to describe very large parts of it's functionality, often leaving things very under-specified, which I think is the case here. I don't think that the original proposal gives us enough to decide for or against this change. Personally, I think that if a user can get themselves into a situation where they:
They are reasonable to conclude that the system is misbehaving, either because the source or target system is unhealthy, or MM2 is unhealthy, or MM2 has a bug in it. If caching causes the above situation to occur, I don't think that caching is a viable solution. I'd be interested in trying other strategies such as:
@hudeqi In your environment, are you noticing the load on the source system from the ACL reads? Do you have more MM2s connected to the target cluster or the source cluster? I'm wondering if (2) would actually be helpful, or if reads and writes have approximately the same cost. |
|
This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |


Activation
In the “syncTopicAcls” thread of MirrorSourceConnector, full amount of "TopicAclBindings" related to the replicated topics of the source cluster will be regularly listed, and then fully updated to the target cluster. Therefore, a large number of repeated "TopicAclBindings" will be repeatedly sent by calling "targetAdminClient". This action is redundant. In addition, if too many "TopicAclBindings" are updated at one time, it may also take a long time for the target cluster to handle processing the "createAcls" request, which will affect the accumulation of the request queue of the target cluster and further affect the processing delay of other type requests.
Solution
"TopicAclBinding" can be like the variable “knownConsumerGroups” in MirrorCheckpointConnector, and only update the incremental added "TopicAclBinding" every time, which can solve the above-mentioned problems.