-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Does Parallel-consumer have state that we can read from? #484
Comments
Hi, no, not at the moment. But you can check on a per record basis - the record context in your user function, for failed processing counts. We’re developing the metrics and monitoring systems now:
What were the logs saying when this happened? Lag is a bit more complicated with PC. For example, if there’s a single poison pill that can’t process and your user function doesn’t give up retrying it, lag could show 1,000 but there may only be a single record that is “stuck”. The new monitoring and metrics functions will allow you to see all this info. Cc @nachomdo |
Hi, thank you for your response, As for now, the only way for us to resolve the issue is to track the lag (using external metrics) and restart the process. |
Are you logging any processing failures in your function? |
Yes, I am printing everything I can (error-info), I also trying to catch all exceptions and send records that has failed into different DLQs. I am printing the I am now trying to tackle the issue by adding some kind of supervisor that detects that this consumer is not really doing anything and kill it. |
What is the head offset of that partition? |
I don't have the historical information, I can say that now ( 1.5 hours after the restart) it is Update: As for the auto offset reset policy |
ah - does the log message not include the partition it's talking about? @Ehud-Lev-Forter , if you get a chance, take a look at this PR and see if there's any other metrics you'd like? |
cc @nachomdo |
Looks like we can benefit a lot from this PR as is.
As for the warn log, yes it is missing the partition log |
1 - yup, that status will be in there |
Hi, We have noticed that during high load, one of the consumers can get stuck, meaning it will report the same lag, but will not process anything, we were not able to reproduce it locally, but we are afraid that it will happen again.
In our kafka-stream process we supervise state to monitor the process, if we see an error or problem, we restart the process. Does ParallelConsumer have something similar?
The text was updated successfully, but these errors were encountered: