-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
3.1.3
If "Other Airflow 2/3 version" selected, which one?
No response
What happened?
Some workers sometimes result in subprocesses being killed with exit code -9 (SIGKILL). The error occurs during SDK client PATCH calls to the API server (task_instances/{id}/run) and is accompanied by ServerResponseError: Server returned error.
From the API Server, we can see this error:
10.80.85.20:49928 - "PATCH /execution/task-instances/019a5dd0-200b-7ef9-9c6e-4ce858d2a12c/run HTTP/1.1" 409 Conflict
This is the log of the Worker:
[info [] [Metric Exporter[] Connecting to OpenTelemetry Collector at ...
{"timestamp":"2025-11-07T10:20:36.718192Z","level":"info","event":"Executing workload","workload":"ExecuteTask(token='exxxxx', ti=TaskInstance(id=UUID('019a5dd0-200b-7ef9-9c6e-4ce858d2a12c'), dag_version_id=UUID('019a5dbb-f809-73fe-aaec-c80c3c901047'), task_id='log_tasks_specs', dag_id='nonendemiccampaigncreatedeventv3__r', run_id='scheduled__2025-11-07T09:15:00+00:00', try_number=1, map_index=-1, pool_slots=1, queue='default', priority_weight=4, executor_config=None, parent_context_carrier={}, context_carrier={}), dag_rel_path=PurePosixPath('revision_dags/process.py'), bundle_info=BundleInfo(name='dags-folder', version=None), log_path='dag_id=nonendemiccampaigncreatedeventv3__r/run_id=scheduled__2025-11-07T09:15:00+00:00/task_id=log_tasks_specs/attempt=1.log', type='ExecuteTask')","logger":"__main__","filename":"execute_workload.py","lineno":56}
{"timestamp":"2025-11-07T10:20:36.719045Z","level":"info","event":"Connecting to server:","server":"http://workflow-manager-priority-api-server:8080/execution/","logger":"__main__","filename":"execute_workload.py","lineno":64}
{"timestamp":"2025-11-07T10:20:36.809907Z","level":"info","event":"Secrets backends loaded for worker","count":1,"backend_classes":["EnvironmentVariablesBackend"],"logger":"supervisor","filename":"supervisor.py","lineno":1870}
{"timestamp":"2025-11-07T10:20:36.888747Z","level":"info","event":"Process exited","pid":18,"exit_code":-9,"signal_sent":"SIGKILL","logger":"supervisor","filename":"supervisor.py","lineno":709}
Traceback (most recent call last):
File "/usr/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/python/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/execute_workload.py", line 125, in <module>
main()
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/execute_workload.py", line 121, in main
execute_workload(workload)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/execute_workload.py", line 66, in execute_workload
supervise(
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/supervisor.py", line 1878, in supervise
process = ActivitySubprocess.start(
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/supervisor.py", line 940, in start
proc._on_child_started(ti=what, dag_rel_path=dag_rel_path, bundle_info=bundle_info)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/supervisor.py", line 951, in _on_child_started
ti_context = self.client.task_instances.start(ti.id, self.pid, start_date)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 210, in start
resp = self.client.patch(f"task-instances/{id}/run", content=body.model_dump_json())
File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 1218, in patch
return self.request(
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 338, in wrapped_f
return copy(f, *args, **kw)
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 477, in __call__
do = self.iter(retry_state=retry_state)
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 378, in iter
result = action(retry_state)
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 400, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "/usr/python/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/python/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 480, in __call__
result = fn(*args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 861, in request
return super().request(*args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 825, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 914, in send
response = self._send_handling_auth(
File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 999, in _send_handling_redirects
raise exc
File "/home/airflow/.local/lib/python3.10/site-packages/httpx/_client.py", line 982, in _send_handling_redirects
hook(response)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 175, in raise_on_4xx_5xx
return get_json_error(response) or response.raise_for_status()
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/api/client.py", line 171, in get_json_error
raise err
airflow.sdk.api.client.ServerResponseError: Server returned error
Error in API Server:
ERROR - Top level error source=task loc=task_runner.py:1465
AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 409, 'message': 'Server returned error', 'detail': {'detail': {'reason': 'invalid_state', 'message': 'TI was not in the running state so it cannot be updated', 'previous_state': 'success'}}}
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/task_runner.py", line 1458 in main
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/task_runner.py", line 1007 in run
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/comms.py", line 207 in send
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/comms.py", line 271 in _get_response
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/comms.py", line 258 in _from_frame
Accessing the UI, the task is skipped on the Airflow Dag View.
What you think should happen instead?
It should execute without issues.
How to reproduce
For replicating it, we spawn up 300 DAGs and activated them. We only had 3 Api Severs with 1 core of CPU each and 4 Scheduler with 8Gb of memory and 2 core of CPU.
After some time the some workers start to be shut down with the error shared above.
Operating System
K8s
Versions of Apache Airflow Providers
We are following the constraints for the version with kubernetes executor/kubernetes operator.
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
Similar in spirit to issue #57961
, but includes SIGKILL / subprocess exit.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct