Question: Is maxUncommittedMessagesToHandle infinitely growable? #490

ang-zeyu · 2022-12-06T04:07:03Z

Hi there! Thank you for this great library, we are exploring this as an option for a cross-team use case whereby we cannot increase the number of partitions to scale consumption easily.

One question regarding a parameter, is maxUncommittedMessagesToHandle infinitely growable in terms of memory usage and overflow (I assume something like u64::MAX)? (or anything else)

The scenario we are planning for is that:

We encounter a poison pill that is being infinitely retried
We don't have the resources to set up DLTs or retry topics
We would still like to continue processing incoming messages in a timely manner, until manual intervention is done to move the committed offset past the poison pill, and restart the consumer.

If relevant we are thinking of using unordered or "ordered by key" processing modes.

The text was updated successfully, but these errors were encountered:

astubbs · 2022-12-08T10:00:00Z

You're most welcome! I really enjoy working on it. And there are some big improvements about to be released too (keep an eye only fork if curious).

Where did you see maxUncommittedMessagesToHandle, or are you speaking hypothetically?
Ah! I just found it - oh dear. Yes, it's in the readme, but the configuration was removed a long time ago, when we added per record ack persistence, didn't think it was needed as it's original intent was to prevent big replay upon rebalance (as per record ack wasn't persisted). I'll remove it from the docs.

At the moment, there is no limit enforced (although there used to be). If you would like this, please file a feature request. It hasn't been something many have asked about, but I still think it's a good idea.

So people should still monitor their traditional consumer lag as normal.

You can also monitor the record context in your user function, to see if a record has failed too many times, and decide what to do about it. So although I think it should be implemented, there are ways to handle the situation as is. But I'm keen to hear your thoughts!

The configuration was removed a long time ago, when we added per record ack persistence, didn't think it was needed as it's original intent was to prevent big replay upon rebalance (as per record ack wasn't persisted). See #490

astubbs · 2022-12-08T10:02:56Z

Removing in #503

astubbs · 2022-12-08T10:09:50Z

We would still like to continue processing incoming messages in a timely manner, until manual intervention is done to move the committed offset past the poison pill, and restart the consumer.

FYI just in case - you can do this programatically, but just not throwing an error from your function, and the PC will assume it processed successfully.

Newer versions you will be able to throw a Skip exception, but the effect is the same, just more readable and better logs.

…#503) The configuration was removed a long time ago, when we added per record ack persistence, didn't think it was needed as it's original intent was to prevent big replay upon rebalance (as per record ack wasn't persisted). See #490

ang-zeyu · 2022-12-11T10:34:16Z

Thank you for clarifying! Having no limit is actually just what we need.

Message loss is not desired in our case, so we're actually thinking of having infinite retries. (Let consumer lag alert kick in and notify us, and we fix the underlying problem, or at least whitelist that poison pill scenario specifically)

We'd still like to have the other incoming messages that are fine get processed normally, while poison pills continuously get retried with backoff. I think its in the README here too.

Wondering if there are limits to how far past the committed offset PC can still process messages:

x - poison pill
y - message that would be processed successfully

x y y y 
x y y y times 10000 - will the 10000th message be processed normally?
x y y y times 10000000 - are there any other concerns at this point? (e.g. memory usage)

Big replay upon restarting the consumer is ok too, it is more important for us not to lose any messages and be notified of the issue.

ang-zeyu · 2022-12-11T10:40:48Z

Slightly unrelated but a little more about use case too:

Poison pills should be quite rare / non-existent. We currently have a RabbitMQ up and running with infinite retries and no DLQs for a couple years now, and am planning migration to Kafka. So DLT/retry topics are lower priority, but we'd still like to confirm what would happen in the event one occurs.

Our event volume is also fairly high (~< 10 million per day).

astubbs · 2022-12-21T12:22:43Z

Wondering if there are limits to how far past the committed offset PC can still process messages:

It's effectively Integer.MAX_VALUE, and in next version Long.MAX_VALUE

x y y y times 10000 - will the 10000th message be processed normally?

yes :) - you can try this in a simple test.

x y y y times 10000000 - are there any other concerns at this point? (e.g. memory usage)

nope :) . Memory usage is proportional mainly to max concurrency target, and number of /failing/ messages.

Big replay upon restarting the consumer is ok too, it is more important for us not to lose any messages and be notified of the issue.

Should be no replay of ack'd records.

So DLT/retry topics are lower priority, but we'd still like to confirm what would happen in the event one occurs.

poison pills - your configured retry settings kick in (so infinite would be Integer.MAX_VALUE retires - however, we can make it /actually/ infinite if you'd like?).

And you can have any retry delay policy you wish.

Our event volume is also fairly high (~< 10 million per day).

awesome! looking forward to hearing about your journey some more! and your feature requests :)

ang-zeyu · 2022-12-21T16:43:33Z

@astubbs Thank you for the detailed reply. 😄

It's effectively Integer.MAX_VALUE, and in next version Long.MAX_VALUE

so infinite would be Integer.MAX_VALUE retires - however, we can make it /actually/ infinite if you'd like?

That's great to know, and quite a bit of bandwidth for our use case, I think we would long have fixed the issue before ever reaching these limits.

My general thoughts on the second item are that it would be very rare in practice to hit this limit, with something like exponential retry backoff especially. I don't think it would make sense to have a quick retry policy with PC as well since consumption goes in parallel, resource is not wasted.

I'm also thinking the retry count in PC would simply overflow (cmiiw) which is not critical for any application that is implementing infinite retry (the retry count wouldn't be used anyway).

Should be no replay of ack'd records.

I hadn't noticed this section before. That is really amazing stuff 🤩

ang-zeyu · 2022-12-21T16:45:55Z

I'll close this as that's all my questions for now. Thanks for the help!

astubbs mentioned this issue Dec 8, 2022

docs: Remove legacy config maxUncommittedMessagesToHandle from readme #503

Merged

2 tasks

astubbs added the question Further information is requested label Dec 8, 2022

ang-zeyu closed this as completed Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is maxUncommittedMessagesToHandle infinitely growable? #490

Question: Is maxUncommittedMessagesToHandle infinitely growable? #490

ang-zeyu commented Dec 6, 2022

astubbs commented Dec 8, 2022

astubbs commented Dec 8, 2022

astubbs commented Dec 8, 2022

ang-zeyu commented Dec 11, 2022

ang-zeyu commented Dec 11, 2022

astubbs commented Dec 21, 2022

ang-zeyu commented Dec 21, 2022

ang-zeyu commented Dec 21, 2022

Question: Is maxUncommittedMessagesToHandle infinitely growable? #490

Question: Is maxUncommittedMessagesToHandle infinitely growable? #490

Comments

ang-zeyu commented Dec 6, 2022

astubbs commented Dec 8, 2022

astubbs commented Dec 8, 2022

astubbs commented Dec 8, 2022

ang-zeyu commented Dec 11, 2022

ang-zeyu commented Dec 11, 2022

astubbs commented Dec 21, 2022

ang-zeyu commented Dec 21, 2022

ang-zeyu commented Dec 21, 2022