-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-15845: Detect leaked Kafka clients and servers with LeakTestingExtension #14783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15845: Detect leaked Kafka clients and servers with LeakTestingExtension #14783
Conversation
|
This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
…network resources Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
b7bfba7 to
be561e8
Compare
Signed-off-by: Greg Harris <greg.harris@aiven.io>
…annotations Signed-off-by: Greg Harris <greg.harris@aiven.io>
be561e8 to
59a58df
Compare
|
This PR is being marked as stale since it has not had any activity in 90 days. If you If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
|
@gharris1727 shall we try to push it across the finish line? We can begin to enable this, at least for tests which are passing this check now. |
|
This PR is being marked as stale since it has not had any activity in 90 days. If you If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
|
This PR has been closed since it has not had any activity in 120 days. If you feel like this |
There are currently many tests which instantiate sockets and channels, but don't close them. The vast majority of these sockets and channels are created via Selector and SocketServer, and are associated with clients and servers which are not closed properly. Many of these leaks are silent, and have gone unnoticed for years.
To allow us to detect these leaks and prevent future ones, we should have a Junit5 Extension which can detect when tests leak clients, and fail those tests with a diagnostic message.
Implementation note: I tried to keep the
mainchanges as small as possible, making use of the javax.net Factory classes to create a layer of indirection in the NetworkContext. This is a much less invasive change than dependency-injecting the factories and possibly having to add more constructors to the Kafka clients.While trying this out, I applied it automatically to all test suites with Jupiter's automatic registration mechanism. I added opt-out IgnoreAll and IgnoreEach annotations. I found that many test suites create a single Kafka cluster for the whole test, and this would cause the Each extension to generate false positives. This lead me to believe that Each should be opt-in instead, and a developer can add it temporarily with
@ExtendWith(LeakTestingExtension.Each.class).However, using automatic registration also means that consumers of the
clientstest-jar may unintentionally turn on this leak testing. We could avoid this by moving the extension to a different test-utils jar that is only for internal-use, or by disabling automatic registration and making this extension opt-in only.This PR will cause many test failures for tests that are not compliant, and should not be merged as-is. For the set of leaks I found with this testing methodology, see:
If you have a leaked resource, it looks something like this (from KafkaProducerTest):
It shows the exact stack trace of where the socket or selector was instantiated, and attributes the failure to the test which caused it. I found that this was sufficient to find and fix leaks in parts of the codebase I had never seen before, so it should be very helpful for people who already know their way around the test and the surrounding code.
Committer Checklist (excluded from commit message)