task failed with a null hostname #13692

doowhtron · 2021-01-15T09:36:50Z

Apache Airflow version: 2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

Cloud provider or hardware configuration: tencent cloud
OS (e.g. from /etc/os-release): centos7
Kernel (e.g. uname -a): 3.10
Install tools:
Others: Server version: 8.0.22 MySQL Community Server - GPL

What happened:

Task is in the failed state. I found the log file on one of the worker node and the task is actually success. And in the task instance details tab, the hostname field is null.

And the logs are as follows:

*** Log file does not exist: /data/app/epic-airflow/logs/tiny_demo80677608236/task_0/2021-01-15T09:04:00+00:00/1.log
*** Fetching from: http://:8793/log/tiny_demo80677608236/task_0/2021-01-15T09:04:00+00:00/1.log
*** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/tiny_demo80677608236/task_0/2021-01-15T09:04:00+00:00/1.log': No host supplied

What you expected to happen:

Task should be success.

How to reproduce it:

I don't know how to reproduce it steadyly. It happened sometimes.

Anything else we need to know:

No

The text was updated successfully, but these errors were encountered:

kaxil · 2021-01-19T23:09:05Z

Need more information about the setup of your Airflow Installation and if possible steps to reproduce

doowhtron · 2021-03-16T09:08:29Z

Need more information about the setup of your Airflow Installation and if possible steps to reproduce

It still happens after I upgrade Airflow to 2.0.1

Maybe it is caused by the situation that a single task is scheduled more than once? I have four schedulers running on four hosts and two of them scheduled the same dag nearly the same time? Maybe a bug happened?

doowhtron · 2021-03-16T09:14:26Z

Is it possible to have a mechanism prevent a task from scheduling twice?

emukans · 2021-03-23T13:29:35Z

I noticed the same issue on airflow 2.0.1 when I used cron notation for specifying schedule_interval. Here is my DAG:

DAG(
        "id",
        default_args=default_args,
        description="description",
        schedule_interval="0 14 * * *",
        start_date=datetime(2021, 3, 23)
    )

I wanted to specify the time. I solved the issue by using DateTimeSensor or TimeSensor and changing schedule_interval="@daily"

hafid-d · 2021-05-20T12:27:04Z

Getting the same issue as well on airflow 2.0.2. Anyone solved it ?

pparthesh · 2021-05-26T13:38:11Z

Getting the same issue.

victorouttes · 2021-06-08T20:06:38Z

Same issue here. Airflow 1.10.15 on kubernetes.

vazmeee · 2021-06-13T15:27:44Z

Facing a similar issue on celery worker as well. Airflow 2.0.0

uranusjr · 2021-06-30T13:49:08Z

Everyone facing the issue, please provide some more information about the setup of your Airflow installation, and if possible, steps to reproduce.

huozhanfeng · 2021-06-30T22:43:54Z

Everyone facing the issue, please provide some more information about the setup of your Airflow installation, and if possible, steps to reproduce.

Just run a dag and let its task runs failed to reproduce the bug. More details in #16729. It's a bug of the scheduler, I have resolved it in my local env and will give a PR to solve it later.

The log can't be shown normally when the task runs failed. Users can only get useless logs as follows. #13692 <pre> *** Log file does not exist: /home/airflow/airflow/logs/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Fetching from: http://:8793/log/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Failed to fetch log file from worker. Unsupported URL protocol </pre> The root cause is that scheduler will overwrite the hostname info into the task_instance table in DB by using blank str in the progress of `_execute_task_callbacks` when tasks into failed. Webserver can't get the right host of the task from task_instance because the hostname info of task_instance table is lost in the progress. Co-authored-by: huozhanfeng <huozhanfeng@vipkid.cn>

The log can't be shown normally when the task runs failed. Users can only get useless logs as follows. #13692 <pre> *** Log file does not exist: /home/airflow/airflow/logs/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Fetching from: http://:8793/log/dag_id/task_id/2021-06-28T00:00:00+08:00/28.log *** Failed to fetch log file from worker. Unsupported URL protocol </pre> The root cause is that scheduler will overwrite the hostname info into the task_instance table in DB by using blank str in the progress of `_execute_task_callbacks` when tasks into failed. Webserver can't get the right host of the task from task_instance because the hostname info of task_instance table is lost in the progress. Co-authored-by: huozhanfeng <huozhanfeng@vipkid.cn> (cherry picked from commit 34478c2)

eladkal · 2022-01-16T12:42:33Z

Is anyone experiencing this issue on Airflow 2.2.3? If so please share reproduce steps

github-actions · 2022-02-16T00:07:28Z

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

changxiaoju · 2022-02-20T05:37:41Z

happens on Airflow2.2.4rc1

potiuk · 2022-02-20T09:17:00Z

happens on Airflow2.2.4rc1

Almost for sure it is an environmental/deployment issue. But I would love to the bottom of it so I will need your help @changxiaoju.

Does it happen all the time or only sporadically ? How often? Any details on the conditions that it happens ?

Can you check and possibly re-configure your hostname callable @changxiaoju https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#hostname-callable

Those issues can happen if your host cannot rertrieve hostname quickly enough - for example when your DNS experiences slow responses, delays, latency etc. Can you double check please if the hostname_callable of yours responds quickly (for example by repetitive calling it and making sure that you get the right response and report to use what are your findings?

Also stating what kind of deployment and configuration you have woudl be actually helpful. Specifically also how your DNS works, whether it is a cloud deployment etc?

You might want to use another method if your DNS is slow to respond and has "lagss"

changxiaoju · 2022-02-20T15:36:22Z

thankyou potiuk and #18239 @dimon222.
it is now ok by setting hostname_callable = airflow.utils.net.get_host_ip_address.

I really want to offer you more information, but i am not very familiar with what you said.
It happened all the time, i use it on remote server, the calculation clusters. And i only changed the postgresql related settings in airflow.cfg.

changxiaoju · 2022-02-21T11:18:24Z

well, it worked several times and happened again,

happens on Airflow2.2.4rc1

Almost for sure it is an environmental/deployment issue. But I would love to the bottom of it so I will need your help @changxiaoju.

Does it happen all the time or only sporadically ? How often? Any details on the conditions that it happens ?

Can you check and possibly re-configure your hostname callable @changxiaoju https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#hostname-callable

Those issues can happen if your host cannot rertrieve hostname quickly enough - for example when your DNS experiences slow responses, delays, latency etc. Can you double check please if the hostname_callable of yours responds quickly (for example by repetitive calling it and making sure that you get the right response and report to use what are your findings?

Also stating what kind of deployment and configuration you have woudl be actually helpful. Specifically also how your DNS works, whether it is a cloud deployment etc?

You might want to use another method if your DNS is slow to respond and has "lagss"

Shut down for the same reson again, it is really hard to understand.

potiuk · 2022-02-26T22:34:36Z

I really want to offer you more information, but i am not very familiar with what you said. It happened all the time, i use it on remote server, the calculation clusters. And i only changed the postgresql related settings in airflow.cfg.

I think you need to run at the environmental stuff - we cannot help to solve those if we see no logs pointing to the issues. If you manage a software on k8s you should really be able to take a look at the logs of k8s - not only the logs of application - I think the reason is there - but it's a bit beyond the scope of Airlfow. You just need to review the logs of your environment to look for some anomalies.

github-actions · 2022-03-29T00:11:51Z

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions · 2022-04-07T00:11:43Z

This issue has been closed because it has not received response from the issue author.

mfridrikhson-tp · 2022-05-26T12:34:04Z

We had the same issue with retrieving task logs but from what I've been able to find out it looks like it is not related to any connection issues with an actual log provider.

Our specific case of this issue was the following: we got a task failure and the task instance was marked failed in the DAG view. When you go to the task view you could see 2 log tabs: first with the Failed to fetch log file message and second with a successful run and all the logs.
I went to check the log folder and what I found out is that there was only logs for the second tab (task try):

(Ignore the 3rd log file - it's from a later rerun after the incident I'm describing had happened)

The actual cause why the UI renders such an error (at least in our case) is that it renders max_try_number number of tabs and queries the contents for each one by its index. It happened to be that there is no file for tab #1 and thus it shown this error message.

So my next guess was that for some reason task instance's try_number got increased one extra time. And that's why I find @doowhtron's comment important here - maybe some kind of race condition happens which causes the task to execute twice or something.

Some notes about our case and environment:

The issue happens intermittently (usually everything works fine)
It is not happening to some specific task or operator (both sensors and default tasks had this issue)
It is not happening specifically when the cluster is experiencing high load or the opposite, so I don't think it has something to do with the performance
We use Airflow v2.2.4
We store logs on S3
We run a single scheduler

grihabor · 2023-06-15T15:23:38Z

If you're struggling with this issue locally in docker compose, check the health of your worker. My deployment was missing redis dependency, which caused worker to throw an exception. But redis was not required to run the webserver, so I could see the UI and there was exactly this error.

doowhtron added the kind:bug This is a clearly a bug label Jan 15, 2021

vikramkoka added the affected_version:2.0 Issues Reported for 2.0 label Jan 15, 2021

potiuk added this to the Airflow 2.0.1 milestone Jan 17, 2021

kaxil removed this from the Airflow 2.0.1 milestone Jan 19, 2021

vikramkoka added the area:Scheduler including HA (high availability) scheduler label Mar 25, 2021

eladkal mentioned this issue May 14, 2021

worker ip missing in webserver logs #15796

Closed

eladkal modified the milestone: Airflow 2.1.1 May 14, 2021

kaxil mentioned this issue Jun 22, 2021

No log folder created by workers in AIRFLOW 2.1.0 #16234

Closed

uranusjr mentioned this issue Jun 30, 2021

[Bug] Task log can't be shown on Web-Server when the task runs failed #16729

Closed

huozhanfeng mentioned this issue Jul 2, 2021

Fix bug that log can't be shown when task runs failed #16768

Merged

gh4n mentioned this issue Jul 14, 2021

on_failure_callback on DAG level is not executed #16983

Closed

eladkal added the pending-response label Jan 16, 2022

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 16, 2022

github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 21, 2022

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 29, 2022

github-actions bot closed this as completed Apr 7, 2022

uranusjr mentioned this issue May 19, 2022

Failed to fetch log file from worker. Request URL missing either an 'http://' or 'https://' protocol. #23798

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task failed with a null hostname #13692

task failed with a null hostname #13692

doowhtron commented Jan 15, 2021

kaxil commented Jan 19, 2021

doowhtron commented Mar 16, 2021 •

edited

Loading

doowhtron commented Mar 16, 2021

emukans commented Mar 23, 2021

hafid-d commented May 20, 2021 •

edited

Loading

pparthesh commented May 26, 2021

victorouttes commented Jun 8, 2021

vazmeee commented Jun 13, 2021

uranusjr commented Jun 30, 2021

huozhanfeng commented Jun 30, 2021 •

edited

Loading

eladkal commented Jan 16, 2022 •

edited

Loading

github-actions bot commented Feb 16, 2022

changxiaoju commented Feb 20, 2022

potiuk commented Feb 20, 2022

changxiaoju commented Feb 20, 2022

changxiaoju commented Feb 21, 2022

potiuk commented Feb 26, 2022

github-actions bot commented Mar 29, 2022

github-actions bot commented Apr 7, 2022

mfridrikhson-tp commented May 26, 2022 •

edited

Loading

grihabor commented Jun 15, 2023

task failed with a null hostname #13692

task failed with a null hostname #13692

Comments

doowhtron commented Jan 15, 2021

kaxil commented Jan 19, 2021

doowhtron commented Mar 16, 2021 • edited Loading

doowhtron commented Mar 16, 2021

emukans commented Mar 23, 2021

hafid-d commented May 20, 2021 • edited Loading

pparthesh commented May 26, 2021

victorouttes commented Jun 8, 2021

vazmeee commented Jun 13, 2021

uranusjr commented Jun 30, 2021

huozhanfeng commented Jun 30, 2021 • edited Loading

eladkal commented Jan 16, 2022 • edited Loading

github-actions bot commented Feb 16, 2022

changxiaoju commented Feb 20, 2022

potiuk commented Feb 20, 2022

changxiaoju commented Feb 20, 2022

changxiaoju commented Feb 21, 2022

potiuk commented Feb 26, 2022

github-actions bot commented Mar 29, 2022

github-actions bot commented Apr 7, 2022

mfridrikhson-tp commented May 26, 2022 • edited Loading

grihabor commented Jun 15, 2023

doowhtron commented Mar 16, 2021 •

edited

Loading

hafid-d commented May 20, 2021 •

edited

Loading

huozhanfeng commented Jun 30, 2021 •

edited

Loading

eladkal commented Jan 16, 2022 •

edited

Loading

mfridrikhson-tp commented May 26, 2022 •

edited

Loading