Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18873: Fixed incorrect error message when exceeds 5 for transactional producers #19041

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

EsMoX
Copy link

@EsMoX EsMoX commented Feb 26, 2025

The unit test testInflightRequestsAndIdempotenceForIdempotentProducers checks for configuration validation errors when instantiating a ProducerConfig with invalid properties. One of the assertions in this test is designed to validate the constraint that max.in.flight.requests.per.connection must be at most 5 when using a transactional producer. However, the error message thrown by the ProducerConfig constructor in this scenario is incorrect.

Observed Behavior:
When max.in.flight.requests.per.connection is set to 6 for a transactional producer, the test expects an exception with the message:
"Must set max.in.flight.requests.per.connection to at most 5 when using the transactional producer."
Instead, the error message states:
"Must set retries to non-zero when using the idempotent producer."

Expected Behavior:
The error message should explicitly indicate the violation of the max.in.flight.requests.per.connection constraint for transactional producers:
"Must set max.in.flight.requests.per.connection to at most 5 when using the transactional producer."

The mismatch in the error message can lead to confusion for developers debugging the configuration error, as it incorrectly hints at a retries configuration issue instead of the actual max.in.flight.requests.per.connection issue.

Note: the contribution is my original work and I license the work to the project under the project's open source license

@github-actions github-actions bot added triage PRs from the community producer tests Test fixes (including flaky tests) clients small Small PRs labels Feb 26, 2025
Copy link
Collaborator

@kirktrue kirktrue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @EsMoX!

I'm a little surprised that this test bug exists. Either this test has been failing for awhile, or something was changed in the ProducerConfig logic to change the message. Do you have any idea when this test started failing?

Thanks!

@github-actions github-actions bot removed the triage PRs from the community label Feb 27, 2025
@EsMoX
Copy link
Author

EsMoX commented Feb 27, 2025

Thanks for the PR @EsMoX!

I'm a little surprised that this test bug exists. Either this test has been failing for awhile, or something was changed in the ProducerConfig logic to change the message. Do you have any idea when this test started failing?

Thanks!

@EsMoX EsMoX closed this Feb 27, 2025
@EsMoX EsMoX reopened this Feb 27, 2025
@github-actions github-actions bot added the triage PRs from the community label Feb 27, 2025
@EsMoX
Copy link
Author

EsMoX commented Feb 27, 2025

Thanks for the PR @EsMoX!
I'm a little surprised that this test bug exists. Either this test has been failing for awhile, or something was changed in the ProducerConfig logic to change the message. Do you have any idea when this test started failing?
Thanks!

Thank you for your feedback, @kirktrue!

I looked into the issue, and it seems that this problem may have existed for some time. The test likely started failing due to a change in the ProducerConfig logic that altered the exception prioritization, resulting in a mismatch between the expected and actual error messages. However, I couldn't pinpoint the exact commit or version where this change occurred.

If it’s helpful, I can investigate further to identify when this behavior was introduced. Let me know if you'd like me to proceed with that!

Thanks again for reviewing the PR!

@kirktrue
Copy link
Collaborator

cc @mjsax

@kirktrue
Copy link
Collaborator

kirktrue commented Feb 27, 2025

If it’s helpful, I can investigate further to identify when this behavior was introduced. Let me know if you'd like me to proceed with that!

I'll let Matthias (or someone else) decide if it's worth pursuing where this crept in.

Thanks again for the PR 😄

@github-actions github-actions bot removed the triage PRs from the community label Feb 28, 2025
@mjsax
Copy link
Member

mjsax commented Feb 28, 2025

Thanks for the PR. Seems there is some confusion about this test...

First, the test was added like this 2 year ago and never modified. So it was missed on the original PR review 7c280c1

But it seems there is some confusion what the test does... The message string, this PR changes, has nothing to do with what it actually tested... assertThrows only checks if the exception is ConfigException -- it does not verify the exception message. If assertThrows fails (ie, no exception or different exception thrown), it would use message as test result output, to given an informative message to the developer why the test fails (of course, this message would be misleading so yes, we should fix it.)

Thus, there is no bug in the actual code, and because the test just passes always ever since it was added, the incorrect "test failed message" was never detected.

I am happy to merge this PR as-is, as it does improve the test. But I want to point out, that it does not verify the actually error message from ConfigException. If we want to verify the error message, we would need to change the test further. Not sure if we want/need to verify the error message. Thoughts?

Just let me know how you want to proceed.

@EsMoX
Copy link
Author

EsMoX commented Feb 28, 2025

Thanks for the PR. Seems there is some confusion about this test...

First, the test was added like this 2 year ago and never modified. So it was missed on the original PR review 7c280c1

But it seems there is some confusion what the test does... The message string, this PR changes, has nothing to do with what it actually tested... assertThrows only checks if the exception is ConfigException -- it does not verify the exception message. If assertThrows fails (ie, no exception or different exception thrown), it would use message as test result output, to given an informative message to the developer why the test fails (of course, this message would be misleading so yes, we should fix it.)

Thus, there is no bug in the actual code, and because the test just passes always ever since it was added, the incorrect "test failed message" was never detected.

I am happy to merge this PR as-is, as it does improve the test. But I want to point out, that it does not verify the actually error message from ConfigException. If we want to verify the error message, we would need to change the test further. Not sure if we want/need to verify the error message. Thoughts?

Just let me know how you want to proceed.

Thank you @mjsax for the detailed explanation and feedback!

After considering the test behavior, I believe adding validation for the exception message would significantly improve the test for the following reasons:

  1. Promoting Shared Understanding : Validating the exception message promotes shared knowledge among developers about the exact message that should be thrown when a ConfigException occurs. Since the error message is already defined when throw new ConfigException(), reusing it in the test ensures consistency, reduces duplication, and acts as a form of documentation for the expected behavior. This makes the codebase easier to understand and maintain.

  2. Improved Test Quality : By validating both the exception type and its message, the test becomes stricter and more reliable. This ensures consistency and avoids issues like misleading failure messages, while providing greater confidence in the correctness of the code under test.

That said, I agree with you in proceeding with the PR as-is for now, as it already improves the test by fixing the misleading failure message. Once the community decides how to handle exception message validation, I’ll be happy to revisit the test and make any necessary updates to align with the agreed approach. Let me know your thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clients producer small Small PRs tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants