-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task failed with a null hostname #13692
Comments
Need more information about the setup of your Airflow Installation and if possible steps to reproduce |
It still happens after I upgrade Airflow to 2.0.1 Maybe it is caused by the situation that a single task is scheduled more than once? I have four schedulers running on four hosts and two of them scheduled the same dag nearly the same time? Maybe a bug happened? |
Is it possible to have a mechanism prevent a task from scheduling twice? |
I noticed the same issue on airflow 2.0.1 when I used cron notation for specifying schedule_interval. Here is my DAG:
I wanted to specify the time. I solved the issue by using DateTimeSensor or TimeSensor and changing |
Getting the same issue as well on airflow 2.0.2. Anyone solved it ? |
Getting the same issue. |
Same issue here. Airflow 1.10.15 on kubernetes. |
Facing a similar issue on celery worker as well. Airflow 2.0.0 |
Everyone facing the issue, please provide some more information about the setup of your Airflow installation, and if possible, steps to reproduce. |
Just run a dag and let its task runs failed to reproduce the bug. More details in #16729. It's a bug of the scheduler, I have resolved it in my local env and will give a PR to solve it later. |
The log can't be shown normally when the task runs failed. Users can only get useless logs as follows. #13692 <pre> *** Log file does not exist: /home/airflow/airflow/logs/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Fetching from: http://:8793/log/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Failed to fetch log file from worker. Unsupported URL protocol </pre> The root cause is that scheduler will overwrite the hostname info into the task_instance table in DB by using blank str in the progress of `_execute_task_callbacks` when tasks into failed. Webserver can't get the right host of the task from task_instance because the hostname info of task_instance table is lost in the progress. Co-authored-by: huozhanfeng <huozhanfeng@vipkid.cn>
The log can't be shown normally when the task runs failed. Users can only get useless logs as follows. #13692 <pre> *** Log file does not exist: /home/airflow/airflow/logs/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Fetching from: http://:8793/log/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Failed to fetch log file from worker. Unsupported URL protocol </pre> The root cause is that scheduler will overwrite the hostname info into the task_instance table in DB by using blank str in the progress of `_execute_task_callbacks` when tasks into failed. Webserver can't get the right host of the task from task_instance because the hostname info of task_instance table is lost in the progress. Co-authored-by: huozhanfeng <huozhanfeng@vipkid.cn> (cherry picked from commit 34478c2)
Is anyone experiencing this issue on Airflow 2.2.3? If so please share reproduce steps |
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
happens on Airflow2.2.4rc1 |
Almost for sure it is an environmental/deployment issue. But I would love to the bottom of it so I will need your help @changxiaoju. Does it happen all the time or only sporadically ? How often? Any details on the conditions that it happens ? Can you check and possibly re-configure your hostname callable @changxiaoju https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#hostname-callable Those issues can happen if your host cannot rertrieve hostname quickly enough - for example when your DNS experiences slow responses, delays, latency etc. Can you double check please if the hostname_callable of yours responds quickly (for example by repetitive calling it and making sure that you get the right response and report to use what are your findings? Also stating what kind of deployment and configuration you have woudl be actually helpful. Specifically also how your DNS works, whether it is a cloud deployment etc? You might want to use another method if your DNS is slow to respond and has "lagss" |
thankyou potiuk and #18239 @dimon222. I really want to offer you more information, but i am not very familiar with what you said. |
well, it worked several times and happened again,
Shut down for the same reson again, it is really hard to understand. |
I think you need to run at the environmental stuff - we cannot help to solve those if we see no logs pointing to the issues. If you manage a software on k8s you should really be able to take a look at the logs of k8s - not only the logs of application - I think the reason is there - but it's a bit beyond the scope of Airlfow. You just need to review the logs of your environment to look for some anomalies. |
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
We had the same issue with retrieving task logs but from what I've been able to find out it looks like it is not related to any connection issues with an actual log provider. Our specific case of this issue was the following: we got a task failure and the task instance was marked The actual cause why the UI renders such an error (at least in our case) is that it renders So my next guess was that for some reason task instance's Some notes about our case and environment:
|
If you're struggling with this issue locally in docker compose, check the health of your worker. My deployment was missing redis dependency, which caused worker to throw an exception. But redis was not required to run the webserver, so I could see the UI and there was exactly this error. |
Apache Airflow version: 2.0.0
Kubernetes version (if you are using kubernetes) (use
kubectl version
):Environment:
uname -a
): 3.10What happened:
Task is in the failed state. I found the log file on one of the worker node and the task is actually success. And in the task instance details tab, the hostname field is null.
And the logs are as follows:
What you expected to happen:
Task should be success.
How to reproduce it:
I don't know how to reproduce it steadyly. It happened sometimes.
Anything else we need to know:
No
The text was updated successfully, but these errors were encountered: