You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve already written about a bug we’ve discovered with the DRAIN shutdown mode here: #552. After we realized the PC keeps polling messages in the DRAIN mode we chose the DONT_DRAIN mode to gracefully shutdown the PC, but it behaves unexpectedly too.
The version in use is 0.5.2.4.
Our use case is pretty standard I would say. We use only the core module and message processing involves network calls to other services. When calling them, errors are expected, so we commit the offsets in both the happy case and not-so-happy case.
The problem comes at the shutdown. In the DONT_DRAIN mode the PC will shutdown the worker threads via interrupts. This means that all in-progress messages will fail on network calls. Of course, we don’t want to commit the failed messages in this case. At the moment, to avoid committing the message’s offset, we catch all the exceptions thrown by the processing code and check if the root cause is an InterruptedException. If it’s the case then throw the exception out of the user function to the PC. This way the PC puts the message in the retry queue and doesn’t commit it. This approach works at the moment, but it’s kinda hacky I would say.
All this said, do you think it’s reasonable to change the way in which the worker thread pool is shutdown so that it doesn’t interrupt the worker threads, but instead lets the already scheduled tasks to finish gracefully? This is basically adhering to DONT_DRAIN's current description:
Stop downloading more messages, and stop procesing more messages in the queue, but finish processing messages already being processed locally.
The text was updated successfully, but these errors were encountered:
Hi,
I’ve already written about a bug we’ve discovered with the
DRAIN
shutdown mode here: #552. After we realized the PC keeps polling messages in theDRAIN
mode we chose theDONT_DRAIN
mode to gracefully shutdown the PC, but it behaves unexpectedly too.The version in use is
0.5.2.4
.Our use case is pretty standard I would say. We use only the core module and message processing involves network calls to other services. When calling them, errors are expected, so we commit the offsets in both the happy case and not-so-happy case.
The problem comes at the shutdown. In the
DONT_DRAIN
mode the PC will shutdown the worker threads via interrupts. This means that all in-progress messages will fail on network calls. Of course, we don’t want to commit the failed messages in this case. At the moment, to avoid committing the message’s offset, we catch all the exceptions thrown by the processing code and check if the root cause is anInterruptedException
. If it’s the case then throw the exception out of the user function to the PC. This way the PC puts the message in the retry queue and doesn’t commit it. This approach works at the moment, but it’s kinda hacky I would say.All this said, do you think it’s reasonable to change the way in which the worker thread pool is shutdown so that it doesn’t interrupt the worker threads, but instead lets the already scheduled tasks to finish gracefully? This is basically adhering to
DONT_DRAIN
's current description:The text was updated successfully, but these errors were encountered: