Skip to content

Conversation

@sjyangkevin
Copy link
Contributor

@sjyangkevin sjyangkevin commented Jan 17, 2026

closes: #58381

Summary

The audit logging doesn't seem to capture events for task lifecycle state transitions including running and success. As mentioned in the document, it should capture system-generated events including "task lifecycle state transitions (queued, running, success, failed)". As mentioned in the issue, task-level success and running events were logged in Airflow 2.10.x.

In Airflow 3, task state is set to running through the execution API endpoint /{task_instance_id}/run (ti_run), and task state update is handled the endpoint /{task_instance_id}/state (ti_update_state). The logic to insert log entry seems missing when updating task instance states.

To log the state transition events, we need to write logs into DB through when the two functions is processing the state update. However, if we would also want to cover the "queue" event, we need to add similar logic in the function below:

def _executable_task_instances_to_queued(self, max_tis: int, session: Session) -> list[TI]:
"""
Find TIs that are ready for execution based on conditions.
Conditions include:
- pool limits
- DAG max_active_tasks
- executor state
- priority
- max active tis per DAG
- max active tis per DAG run
:param max_tis: Maximum number of TIs to queue in this loop.
:return: list[airflow.models.TaskInstance]
"""

Current (Basic) Approach

Since both ti_run and ti_update_state already execute a SELECT query at the start to fetch task metadata, The audit log entry is constructed through TaskInstanceKey by copying the data from the query result with an updated state. As this insert operation is done through the same session, if I understand correctly, this operation is atomic and can ensure consistency between the actual task state and audit log.

Some Issues

  1. There are some inconsistency in audit logs between Airflow 2 and Airflow 3 when implementing in this way. For example, logical_date field is empty. In ti_run, this could probably be collected from the DAG Run query, but in ti_update_state extra query is required. Also, owner field is empty if it is not explicitly passed. extra field need to be explicitly constructed but only hostname is available.
  2. In scheduler job, there is TaskInstance object available which contains logical_date. This is more ideal as the log is more closer to what are in Airflow 2's audit log, but similarly still need to construct the extra field. To make the logging behavior similar to the one we have in scheduler require extra query to construct those information.

The logging behavior is kind of different now based on where the insert is implemented. Thinking about a way to make this implementation more unified and ensure consistency between actual task state and log entry.

Current Implementation

breeze start-airflow --python 3.12 --backend postgres --db-reset --load-example-dags
Screenshot from 2026-01-17 01-28-44

Airflow 2 Audit Log

breeze start-airflow --use-airflow-version 2.10.5 --python 3.12 --backend postgres --db-reset --load-example-dags
Screenshot from 2026-01-17 01-36-59

Notice that the queued event is not logged.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Antigravity, Claude Opus 4.5] following the guidelines

  1. The test cases are generated by the tool and reviewed/refined through comparing with existing test implementation pattern
  2. All relevant static checks have been run and the generated test cases have been run/tested locally

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:Scheduler including HA (high availability) scheduler labels Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task-level audit logs missing SUCCESS/RUNNING events in Airflow 3.1.x (only FAILED and state mismatch recorded)

1 participant