Investigate potential race conditions in KafkaSupervisor #5919

surekhasaharan · 2018-06-28T21:45:51Z

In KafkaSupervisor , the taskGroups is accessed by multiple threads and modified in main exec thread. In checkTaskDuration , the keys from taskGroups are retrieved and passed to checkpointTaskGroup which executes in workerExec thread and returns a Future. So potential race condition could be while the future is executing, a groupId might get removed from taskGroups. Potential places causing race condition could be in KafkaSupervisor.checkpointTaskGroup() or KafkaSupervisor.verifyAndMergeCheckpoints().
Related issue #5900

The text was updated successfully, but these errors were encountered:

jihoonson · 2018-06-28T22:06:03Z

I would add some more details. Here is a snippet of KafkaSupervisor.checkpointTaskGroup().

  private ListenableFuture<Map<Integer, Long>> checkpointTaskGroup(final int groupId, final boolean finalize)
  {
    final TaskGroup taskGroup = taskGroups.get(groupId);
...
    return Futures.transform(
        Futures.successfulAsList(pauseFutures), new Function<List<Map<Integer, Long>>, Map<Integer, Long>>()
        {
          @Nullable
          @Override
          public Map<Integer, Long> apply(List<Map<Integer, Long>> input)
          {
            // 3) Build a map of the highest offset read by any task in the group for each partition
            final Map<Integer, Long> endOffsets = new HashMap<>();
            for (int i = 0; i < input.size(); i++) {
              Map<Integer, Long> result = input.get(i);

              if (result == null || result.isEmpty()) { // kill tasks that didn't return a value
                String taskId = pauseTaskIds.get(i);
                log.warn("Task [%s] failed to respond to [pause] in a timely manner, killing task", taskId);
                killTask(taskId);
                taskGroup.tasks.remove(taskId);

              } else { // otherwise build a map of the highest offsets seen
...

So, taskGroup is gotten at the first line, and is being used in the future executed by workerExec. The potential race condition is, taskGroup might not be in taskGroups when the future is executed. This matters because actually taskGroups has some state context as well. (taskGroup instances are moved from taskGroups to pendingCompletionTaskGroups if they finish reading). Not sure this is intentional or not.

github-actions · 2023-07-02T00:21:35Z

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

github-actions · 2023-07-31T00:17:42Z

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

surekhasaharan changed the title ~~Investigate potential NPE due to race conditions in KafkaSupervisor~~ Investigate potential race conditions in KafkaSupervisor Jun 28, 2018

jihoonson added Bug Area - Streaming Ingestion labels Jun 28, 2018

github-actions bot added the stale label Jul 2, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2023

terrorizer1980 mentioned this issue Oct 27, 2023

[Snyk] Security upgrade axios from 0.21.1 to 1.6.0 terrorizer1980/druid#617

Closed

ajesse11x mentioned this issue Oct 27, 2023

[Snyk] Security upgrade axios from 0.18.0 to 1.6.0 ajesse11x/incubator-druid#912

Open

soltan mentioned this issue Oct 27, 2023

[Snyk] Security upgrade axios from 0.21.1 to 1.6.0 soltan/druid#622

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate potential race conditions in KafkaSupervisor #5919

Investigate potential race conditions in KafkaSupervisor #5919

surekhasaharan commented Jun 28, 2018 •

edited

Loading

jihoonson commented Jun 28, 2018

github-actions bot commented Jul 2, 2023

github-actions bot commented Jul 31, 2023

Investigate potential race conditions in KafkaSupervisor #5919

Investigate potential race conditions in KafkaSupervisor #5919

Comments

surekhasaharan commented Jun 28, 2018 • edited Loading

jihoonson commented Jun 28, 2018

github-actions bot commented Jul 2, 2023

github-actions bot commented Jul 31, 2023

surekhasaharan commented Jun 28, 2018 •

edited

Loading