You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The webserver is not reaching out to the triggerer log server for the corresponding trigger logs of a deferred task instance. The webserver only reads from the worker / triggerer log server when there are no local logs or remote logs. This behaviour was introduced in #39177.
When the task instance is in a non-terminal state and have remote logs, the behaviour no longer aligns with expectation of live logs as described in the documentation for Serving logs from workers and triggerer.
A specific use case is deferrable operators which has essentially two executions.
First execution is to submit the trigger and put the task into a deferred state
Second execution is to process the trigger event
After the first execution, task log is pushed to the remote location. From then on, the task log view see the log in the remote location and fetches it as expected but it also means the webserver will not reach out to the triggerer for logs.
What you think should happen instead?
If the task instance is deferred and have remote logs, the webserver should still reach out to the triggerer log server.
The behaviour introduced by #39177 so that task instances in a terminal state can continue to fetch logs from the log server if there are no remote logs or local logs. The user stated that their logs are stored in a persistent storage on their worker which is why the user wants to allow server log fetching when there are no remote logging.
I think the log reading code needs to specify a logical path where there are no remote log or local log for deployments without remote logging and logs are not stored on the webserver.
How to reproduce
Setup a deployment with remote logging and run a deferrable task.
Can you speak more about the change introduced in #39177 in case my interpretation is insufficient / incorrect.
The webserver only reads from the worker / triggerer log server when there are no local logs or remote logs. This behaviour was introduced in #39177.
To clarify on this point, #39177 introduced this particular behaviour as an alternative implementation of #32561, which entirely removed fetching logs from the worker / triggerer log server for past task runs. The way #39177 was implemented is to retain the wanted behaviour of #32561 of not triggering the HTTP request in cases where remote logs were found, but still to support our use case of storing the logs on the worker with a persistent volume.
The deferred state logic was kept to be as close as possible to the previous implementations, however it became evident in #39496 (comment), adding tests for the deferred state caused test flakiness with unexpected results. It might be that this has caused a regression in the deferred state, as that one was untested in our use case, whereas viewing previous task attempts was confirmed to be working as expected again after #39177.
If this is deemed problematic/ a regression, going back to behavior prior to #32561 would be fine at least for our use case, as that initially introduced this behavior of not serving the worker / triggerer logs in certain circumstances.
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.9.3
What happened?
The webserver is not reaching out to the triggerer log server for the corresponding trigger logs of a deferred task instance. The webserver only reads from the worker / triggerer log server when there are no local logs or remote logs. This behaviour was introduced in #39177.
When the task instance is in a non-terminal state and have remote logs, the behaviour no longer aligns with expectation of live logs as described in the documentation for Serving logs from workers and triggerer.
A specific use case is deferrable operators which has essentially two executions.
After the first execution, task log is pushed to the remote location. From then on, the task log view see the log in the remote location and fetches it as expected but it also means the webserver will not reach out to the triggerer for logs.
What you think should happen instead?
If the task instance is deferred and have remote logs, the webserver should still reach out to the triggerer log server.
The behaviour introduced by #39177 so that task instances in a terminal state can continue to fetch logs from the log server if there are no remote logs or local logs. The user stated that their logs are stored in a persistent storage on their worker which is why the user wants to allow server log fetching when there are no remote logging.
I think the log reading code needs to specify a logical path where there are no remote log or local log for deployments without remote logging and logs are not stored on the webserver.
How to reproduce
Setup a deployment with remote logging and run a deferrable task.
Operating System
n/a
Versions of Apache Airflow Providers
n/a
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: