Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reset max.poll.interval.ms limit on every poll #4176

Merged
merged 6 commits into from
Mar 13, 2023

Conversation

milindl
Copy link
Contributor

@milindl milindl commented Feb 3, 2023

Earlier on, we changed the code to prevent max.poll.interval.ms from being triggered in case we were inside librdkafka in any sort of a poll call. Top achieve this, blocked the timer using rd_kafka_app_poll_blocking, and reset it at the end of the call.

This doesn't work correctly in the cases where we're simply polling an unrelated queue, like the log queue. In that case, just by polling the log queue, the timer is reset, despite us not actually consuming anything (or doing any consume poll).

At the same time, it's a reasonable expectation that max.poll.interval.ms won't be triggered while we are doing any sort of consumer poll.

This PR takes care of both the cases. The methods (from public API) which block/reset the timer are:

  1. rd_kafka_consume_batch
  2. rd_kafka_consume_batch_queue
  3. rd_kafka_consume_callback
  4. rd_kafka_consume_callback_queue
  5. rd_kafka_consume
  6. rd_kafka_consumer_poll

All the other poll methods like queue_poll etc no longer reset or block the timer.
A failing test was added first, to test the change.

Ref #2365.

@milindl milindl requested review from emasab and a team February 3, 2023 04:45

int main_0089_max_poll_interval(int argc, char **argv) {
do_test_with_optional_queue(rd_false);
do_test_with_optional_queue(rd_true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using different methods for different tests.

  1. It improves readability
  2. We are not touching the old test case just adding a new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also added a changelog entry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're adding some new checks to an old test case, it's ok to change it as long as we don't change existing asserts.
This is also to avoid having tests that are too slow, especially integration tests.
It can also be ok with to parameterize some tests to avoid code duplication, as long as the tests are very similar and the resulting code is readable.

What do you think @pranavrth?
That said @milindl you don't have to revert this change you can leave them separate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with either option as per what we decide.
Both these tests are slow though and can't be run in make quick because of the timeout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that we are touching multiple places in the old test so asked to separate.
Though anything is fine as far as the old test is not changed.

@milindl milindl force-pushed the dev_max_poll_interval_ms_not_honored branch from 99f2005 to 84b69e2 Compare March 13, 2023 11:27
@milindl milindl requested a review from pranavrth March 13, 2023 11:31
Copy link
Contributor

@emasab emasab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@milindl milindl merged commit 5f71083 into master Mar 13, 2023
@milindl milindl deleted the dev_max_poll_interval_ms_not_honored branch March 13, 2023 13:58
@vbmithr
Copy link

vbmithr commented Apr 14, 2023

Just want to mention that this "fix" broke my binding where I use events and read consumer events from the consumer queue (and never calling rd_kafka_consumer_poll). I solve the problem with polling rd_kafka_consumer_poll but it would be more elegant if polling from the consumer queue would reset the timeout as well.

@milindl
Copy link
Contributor Author

milindl commented Apr 15, 2023

Yep, we're aware of the bug already. We're looking into it with #4256 . Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants