KAFKA-15139:Optimize the performance of Set.removeAll(List) in MirrorCheckpointConnector#13946
Merged
gharris1727 merged 1 commit intoapache:trunkfrom Jul 10, 2023
Merged
Conversation
…rorCheckpointConnector`
Contributor
Author
|
This PR is inspired by your suggestion, please help to review it when you have time, thank you. @C0urante |
Contributor
Author
|
and this, thanks. @C0urante |
Contributor
|
@hudeqi I think @gharris1727 can merge this now :) |
Contributor
|
Flaky test failures in the Mirror suite appear unrelated, and the tests pass locally. Merging. |
Cerchie
pushed a commit
to Cerchie/kafka
that referenced
this pull request
Jul 25, 2023
…tor (apache#13946) Reviewed-by: Greg Harris <greg.harris@aiven.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR is inspired by #13913 (comment)
This is the hint of
removeAllmethod inSet:This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.
That's said, assume that M is the number of elements in the set and N is the number of elements in the List, if the type of the specified collection is
List, and M<=N, then the time complexity ofremoveAllis O(MN) (because the time complexity of searching in List is O(N)), on the contrary, if N<M, it will search inSet, the time complexity is O(N).In
MirrorCheckpointConnector,refreshConsumerGroupsmethod is repeatedly called in a daemon thread. There are tworemoveAllin this method. From a logical point of view, when this method is called in one round, when the number of groups in the source cluster simply increases or decreases, the tworemoveAllexecution strategies will always hit the O(MN) situation mentioned above.Solution
Therefore, it is better to change all the variables here to Set type to avoid this "low performance".
This PR has passed the unit test.