-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParallelConsumer would run for a while and then exit due to InternalRuntimeException(Timeout) #833
Comments
Hi @dumontxiong - it is hard to tell if it is an issue or not - it is possible not to get a commit response if something went wrong with Kafka cluster, but without more information I cannot really tell if that was the cause here.
I cannot tell what happened in this case - why response is not there etc - but it should be visible in Debug logs when commit request is picked off the request queue (and if its not picked / processed - then maybe will be something showing why it wasnt). |
Or if you can provide a minimal test application or integration test that reproduces this issue - that would be even better :) |
@rkolesnev
Here's two. subscriber , one is subscribed to topic1 the other is subscribed to topic2,and both of them concurrency = 14
And the above issues and metrics is about the subscriber than subscribed topic2(1000 key) |
Thank you for the repro code - will run and see if i can reproduce / see where the issue occurs. |
Hi @rkolesnev, we got the same issue and PC has been closed unintentionally. We have applied a work-around to try catch the controlLoop function so that the exception won't propagate to supervisorLoop and close the PC entirely. We are looking forward to have a official release if possible. One potential approach we suggest to handle the try/catch mechanism inside the controlLoop during the commit offset. So how this issue can able to reproduce including these steps below:
Regards, |
Hi @ndqvinh2109, @dumontxiong, |
Thanks @rkolesnev for letting me know the retry mechanism has been implemented in latest version. So we adjusted the timeout config and basically the retry happened according to our testing. PC hasn't closed as our expectation. |
@dumontxiong - could you please retest on the latest build of ParallelConsumer? |
Hi team,
version 0.5.3.1
InternalRuntimeException:
My test scenario is a scenario where 50% of records fail, and there's 1000 keys in total, parallelConsumer would run for a while and then exit due to InternalRuntimeException at 24/09/13 21:33:37.130
io.confluent.parallelconsumer.internal.InternalRuntimeException: Timeout waiting for commit response PT30S to request ConsumerOffsetCommitter.CommitRequest(id=79c3ac04-b8c7-4dc2-9b09-c77d6ad6bee4, requestedAtMs=1726234432425)
And we can see the code from ConsumerOffsetCommitter.commitAndWait()
CommitResponse take = commitResponseQueue.poll(commitTimeout.toMillis(), TimeUnit.MILLISECONDS); // blocks, drain until we find our response
cause take is null then throw InternalRuntimeException.
metrics from pc_processed_records_total
During this time, there's no successful records:
Adding commit response to queue:
And below logs from ConsumerOffsetCommitter.maybeDoCommit() show the last time add commit response to queue is 24/09/13 21:16:54.105
Waiting on a commit response:
And we can see the code from ConsumerOffsetCommitter.commitAndWait() show the last time wait commit response from queue is 24/09/13 21:33:52.426
Here's my concerns:
First time adding commit response to queue time is 24/09/13 21:16:54.105, and waiting on a commit response time is 24/09/13 21:16:54.084, within 30s
second time there's no adding commit response to queue but waiting on a commit response time is 24/09/13 21:33:39.194.
so it lead to InternalRuntimeException.
please help to check
BRS,
Dumont
The text was updated successfully, but these errors were encountered: