-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try number inconsistency between webserver and the actual log generated #42549
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
I am not sure whether this is really a bug, might be rather a semantic gap or something that could be (visually) improved to just hide logs when no logs are there. In Airflow 2.10 the internal semantic was changed to increment the try number of a task only at the time the task is really scheduled/started. As long as it is just there as instance (not scheduled) the try=0 is the sign for it has never tried. In this case the visual improvement might be to hide the log panel as no logs are existing. Would you like to submit a small improvement? |
@jscheffl, thats not really the case. Here the task has started and completed, and the log is captured under attempt-0.log. However, the webserver is trying to look for attempt-1.log
|
Thanks for the response. You say this is intermittently - do you have any means to re-produce this? |
let me see if I can isolate this |
Might this be a side effect of #39336 @dstandish ? |
How was this task launched / created / scheduled? |
@dstandish, this is essentially a bash operator The task was triggered on schedule and is part of a task_group. I suspect whats happening here is the |
@dstandish, @jscheffl looks like my pr is exposing a bug with #39336 Take a look at https://github.com/apache/airflow/actions/runs/11133289143/job/30939468337?pr=42633 |
It's certainly possible. But I have not seen this. So can you provide some repro steps? Like, give us the simplest dag possible which also produces this behavior. |
@dstandish, Im trying hard to reproduce the conditions as to when this is happening. This happens intermittently. Take a look at this screenshot. This scenario has I added both these templates in the log_filename_template.
Also, there are many cases where the first task of the dag starts with try_number 2 |
@dstandish, @jscheffl Here's how you reproduce this. Just run the simple dag below. On completion, clear the first task and let it rerun. You will notice make sure you set this cfg option:
|
I tried reproing on main via clearing task but could not do clearing-try-number.movHere's my code
Can you try it and see if you get a diff result? |
@dstandish, you are missing changing once you do that, try clicking between the logs of even better would be setting the following in your airflow config:
|
@dstandish, Issue is that it happens at random. I have a cluster of celery workers which keep listening for jobs from the scheduler. There maybe a case where the try number isn't incremented even when the job is running on the worker. I can't isolate this with airflow standalone and it seems to be effecting all tasks ( there doesn't seem to be a pattern here ). Clearing this task increments the try_number to 2 A workaround here would be to find all attempt-0.log and softlink them to attempt-1.log to fix the gui. I have a dag that is doing this every 20min and there are many tasks scattered all over that are randomly impacted by this |
@dstandish, I figured out what's happening. I had a few workers which were still on airflow 2.7.2 which never got recycled. Thanks for your help on triaging this! |
Awesome, great result |
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.10.2
What happened?
Some tasks in airflow 2.10.2 are being launched with try number 0. However, the webserver is looking for logs starting with try number 1. Because of this we are having cases where tasks are running and passing however, the server throws a "cannot read served logs" error.
Executor=CeleryExecutor
What you think should happen instead?
Airflow should start attempt with try number 1 instead.
How to reproduce
This is happening intermittently. Not all tasks start with try number 0. I have checked to make sure both the webserver and the celery worker are on the same version 2.10.2
Operating System
SUSE Linux Enterprise Server 12 SP5
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.8.1
celery==5.4.0
Deployment
Docker-Compose
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: