Skip to content

KAFKA-15139:Optimize the performance of Set.removeAll(List) in MirrorCheckpointConnector#13946

Merged
gharris1727 merged 1 commit intoapache:trunkfrom
hudeqi:list_to_set
Jul 10, 2023
Merged

KAFKA-15139:Optimize the performance of Set.removeAll(List) in MirrorCheckpointConnector#13946
gharris1727 merged 1 commit intoapache:trunkfrom
hudeqi:list_to_set

Conversation

@hudeqi
Copy link
Contributor

@hudeqi hudeqi commented Jul 1, 2023

Motivation

This PR is inspired by #13913 (comment)

This is the hint of removeAll method in Set:
This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

That's said, assume that M is the number of elements in the set and N is the number of elements in the List, if the type of the specified collection is List, and M<=N, then the time complexity of removeAll is O(MN) (because the time complexity of searching in List is O(N)), on the contrary, if N<M, it will search in Set, the time complexity is O(N).

In MirrorCheckpointConnector, refreshConsumerGroups method is repeatedly called in a daemon thread. There are two removeAll in this method. From a logical point of view, when this method is called in one round, when the number of groups in the source cluster simply increases or decreases, the two removeAll execution strategies will always hit the O(MN) situation mentioned above.

Solution

Therefore, it is better to change all the variables here to Set type to avoid this "low performance".

This PR has passed the unit test.

@hudeqi
Copy link
Contributor Author

hudeqi commented Jul 3, 2023

This PR is inspired by your suggestion, please help to review it when you have time, thank you. @C0urante

Copy link
Contributor

@gharris1727 gharris1727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hudeqi
Copy link
Contributor Author

hudeqi commented Jul 7, 2023

and this, thanks. @C0urante

@C0urante
Copy link
Contributor

@hudeqi I think @gharris1727 can merge this now :)

@gharris1727
Copy link
Contributor

Flaky test failures in the Mirror suite appear unrelated, and the tests pass locally. Merging.

@gharris1727 gharris1727 merged commit 51bc410 into apache:trunk Jul 10, 2023
@hudeqi hudeqi deleted the list_to_set branch July 10, 2023 23:09
Cerchie pushed a commit to Cerchie/kafka that referenced this pull request Jul 25, 2023
…tor (apache#13946)

Reviewed-by: Greg Harris <greg.harris@aiven.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants