Skip to content

[3.x] SDK subprocess killed with SIGKILL when patching TaskInstance (ServerResponseError) #58562

@wolvery

Description

@wolvery

Apache Airflow version

3.1.3

If "Other Airflow 2/3 version" selected, which one?

No response

What happened?

Some workers sometimes result in subprocesses being killed with exit code -9 (SIGKILL). The error occurs during SDK client PATCH calls to the API server (task_instances/{id}/run) and is accompanied by ServerResponseError: Server returned error.

From the API Server, we can see this error:
10.80.85.20:49928 - "PATCH /execution/task-instances/019a5dd0-200b-7ef9-9c6e-4ce858d2a12c/run HTTP/1.1" 409 Conflict

This is the log of the Worker:

 [info     [] [Metric Exporter[] Connecting to OpenTelemetry Collector at ...
{"timestamp":"2025-11-07T10:20:36.718192Z","level":"info","event":"Executing workload","workload":"ExecuteTask(token='exxxxx', ti=TaskInstance(id=UUID('019a5dd0-200b-7ef9-9c6e-4ce858d2a12c'), dag_version_id=UUID('019a5dbb-f809-73fe-aaec-c80c3c901047'), task_id='log_tasks_specs', dag_id='nonendemiccampaigncreatedeventv3__r', run_id='scheduled__2025-11-07T09:15:00+00:00', try_number=1, map_index=-1, pool_slots=1, queue='default', priority_weight=4, executor_config=None, parent_context_carrier={}, context_carrier={}), dag_rel_path=PurePosixPath('revision_dags/process.py'), bundle_info=BundleInfo(name='dags-folder', version=None), log_path='dag_id=nonendemiccampaigncreatedeventv3__r/run_id=scheduled__2025-11-07T09:15:00+00:00/task_id=log_tasks_specs/attempt=1.log', type='ExecuteTask')","logger":"__main__","filename":"execute_workload.py","lineno":56}
{"timestamp":"2025-11-07T10:20:36.719045Z","level":"info","event":"Connecting to server:","server":"http://workflow-manager-priority-api-server:8080/execution/","logger":"__main__","filename":"execute_workload.py","lineno":64}
{"timestamp":"2025-11-07T10:20:36.809907Z","level":"info","event":"Secrets backends loaded for worker","count":1,"backend_classes":["EnvironmentVariablesBackend"],"logger":"supervisor","filename":"supervisor.py","lineno":1870}
{"timestamp":"2025-11-07T10:20:36.888747Z","level":"info","event":"Process exited","pid":18,"exit_code":-9,"signal_sent":"SIGKILL","logger":"supervisor","filename":"supervisor.py","lineno":709}
Traceback (most recent call last):
  File "/usr/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/python/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/execute_workload.py", line 125, in <module>
    main()
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/execute_workload.py", line 121, in main
    execute_workload(workload)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/execute_workload.py", line 66, in execute_workload
    supervise(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/supervisor.py", line 1878, in supervise
    process = ActivitySubprocess.start(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/supervisor.py", line 940, in start
    proc._on_child_started(ti=what, dag_rel_path=dag_rel_path, bundle_info=bundle_info)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/supervisor.py", line 951, in _on_child_started
    ti_context = self.client.task_instances.start(ti.id, self.pid, start_date)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 210, in start
    resp = self.client.patch(f"task-instances/{id}/run", content=body.model_dump_json())
  File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 1218, in patch
    return self.request(
  File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 338, in wrapped_f
    return copy(f, *args, **kw)
  File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 477, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 378, in iter
    result = action(retry_state)
  File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 400, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/usr/python/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/python/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 480, in __call__
    result = fn(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 861, in request
    return super().request(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 825, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
  File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 999, in _send_handling_redirects
    raise exc
  File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 982, in _send_handling_redirects
    hook(response)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 175, in raise_on_4xx_5xx
    return get_json_error(response) or response.raise_for_status()
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 171, in get_json_error
    raise err
airflow.sdk.api.client.ServerResponseError: Server returned error

Error in API Server:

ERROR - Top level error source=task loc=task_runner.py:1465
AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 409, 'message': 'Server returned error', 'detail': {'detail': {'reason': 'invalid_state', 'message': 'TI was not in the running state so it cannot be updated', 'previous_state': 'success'}}}
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/task_runner.py", line 1458 in main

File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/task_runner.py", line 1007 in run

File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/comms.py", line 207 in send

File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/comms.py", line 271 in _get_response

File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/comms.py", line 258 in _from_frame

Accessing the UI, the task is skipped on the Airflow Dag View.

What you think should happen instead?

It should execute without issues.

How to reproduce

For replicating it, we spawn up 300 DAGs and activated them. We only had 3 Api Severs with 1 core of CPU each and 4 Scheduler with 8Gb of memory and 2 core of CPU.
After some time the some workers start to be shut down with the error shared above.

Operating System

K8s

Versions of Apache Airflow Providers

We are following the constraints for the version with kubernetes executor/kubernetes operator.

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

Similar in spirit to issue #57961
, but includes SIGKILL / subprocess exit.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:APIAirflow's REST/HTTP APIarea:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions