-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-17553 Fix shutdown race condition in StreamThreadTest #17191
base: trunk
Are you sure you want to change the base?
Conversation
This reverts commit b436499.
Thread dump here https://github.com/apache/kafka/actions/runs/10852637802/artifacts/1932150615
|
I managed to reproduce the issue in this run https://github.com/apache/kafka/actions/runs/10893557629/job/30231419064?pr=17191
Notice the 10 second delay in this test. This is coming from the changes I have in this PR: // in DefaultTaskManager
public void shutdown(final Duration duration) {
for (final TaskExecutor t: taskExecutors) {
t.requestShutdown();
}
signalTaskExecutors();
try {
for (final TaskExecutor t: taskExecutors) {
t.awaitShutdown(Duration.ofSeconds(10));
}
} catch (final Exception e) {
signalTaskExecutors();
for (final TaskExecutor t: taskExecutors) {
t.awaitShutdown(duration);
}
}
} This is kind of a hacky solution, but it does avoid the deadlock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumrah thanks for this quick fix. the approach is good but it may not honor the timeout. Maybe we can add a Supplier<Boolean>
to awaitProcessableTasks
to return without await
? the check is holding the lock, so it can fix the race condition. for example:
awaitProcessableTasks(Supplier<Boolean> needToAwait)
taskManager.awaitProcessableTasks(() -> !shutdownRequested.get());
Line 142 in f1e7954
tasksCondition.await(); |
...ams/src/main/java/org/apache/kafka/streams/processor/internals/tasks/DefaultTaskManager.java
Outdated
Show resolved
Hide resolved
...ams/src/main/java/org/apache/kafka/streams/processor/internals/tasks/DefaultTaskManager.java
Outdated
Show resolved
Hide resolved
One of the Gradle workers hit an OOM
Not sure why, but looks like Mockito was generating a large string as part of a test failure. |
@chia7712 I've updated the code with the supplier approach. I was wondering about something like this, but hesitant to change too much code I'm not familiar with 😅. Turns out not to be too bad. |
Reverting #17180 in order to generate a thread dump of the stalled test.