This repository has been archived by the owner on Jan 24, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The KoP CI tests take much more time than CI tests of branch-2.7.2.
The main reason is the
cleanup()
phase takes a long time, each time a test is cleaned up, it will take over 10 seconds to complete. This behavior was introduced from apache/pulsar#10199, which made broker shutdown gracefully by default but it would take longer to shutdown.The other reason is caused by rebalance time. According to my observes, when a Kafka consumer subscribes a topic in KoP, it will take at least 3 seconds. Finally I found it's caused by the GroupInitialRebalanceDelayMs config, which has the same semantics with Kafka's group.initial.rebalance.delay.ms. It makes Kafka server wait longer for
JOIN_GROUP
request for more consumers to join so that the rebalance count can reduce. However, it should be set zero in tests.After fixing these problems, sometimes the following error may happen and cause flakiness.
It's because
PulsarService#newCompactor
is not mocked well, see apache/pulsar#7102 for detail.Modifications
GroupInitialRebalanceDelayMs
andBrokerShutdownTimeoutMs
for each mockedBrokerService
.After the changes, the tests time has a significant improvement.
For example,
GroupCoordinatorTest
takes only 3 minutes now but it could take 9 minutes before. Because thecleanup()
method is marked as@AfterMethod
and would be called each time a single test finished.Another example is that
BasicEndToEndKafkaTest
takes only 37 seconds now but it could take 56 seconds before. Thecleanup()
is marked as@AfterClass
and only happens once, but many consumers will be created during the whole tests and each time a subscribe call can take 3 seconds.