-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Closed
Closed
Copy link
Labels
area:providerskind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yetprovider:apache-hdfs
Description
Apache Airflow Provider(s)
apache-hdfs
Versions of Apache Airflow Providers
apache-airflow-providers-apache-hdfs==4.9.0
Apache Airflow version
3.0.1
Operating System
Ubuntu 22.04
Deployment
Docker-Compose
Deployment details
I deployed the airflow-apiserver and airflow-worker and others on the same machine. I used a Dockerfile to build my airflow images, which use RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /tmp/requirements.txt to install the providers. The base image is apache/airflow:3.0.1.
What happened
When I run a DAG that uses WebHDFSHook, it raises the error AttributeError: 'Connection' object has no attribute 'get_password'. However, I can execute the same code line by line within the worker container without any issues. I'm completely puzzled as to why this error occurs during DAG execution.
The full error log:
AttributeError: 'Connection' object has no attribute 'get_password'
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 838 in run
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1130 in _execute_task
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 408 in wrapper
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py", line 251 in execute
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 408 in wrapper
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 212 in execute
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 235 in execute_callable
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py", line 81 in run
File "/opt/airflow/dags/test/test_hdfs.py", line 22 in list_hdfs_directory
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py", line 154 in check_for_path
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py", line 72 in get_conn
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py", line 91 in _find_valid_serverWhat you think should happen instead
No error.
How to reproduce
- Use the official docker-compose.yaml file and modify the
x-airflow-commonpart
# image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:3.0.1}
build: .- Create a
Dockerfile
FROM apache/airflow:3.0.1
USER root
# Install Kerberos and build tools
RUN apt-get update && \
apt-get install -y gcc g++ libkrb5-dev krb5-user && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
USER airflow
# Install HDFS provider
RUN pip install apache-airflow-providers-apache-hdfs==4.9.0- Run
docker compose buildto build images - Setup and initialize the databse
mkdir -p ./dags ./logs ./plugins ./config
echo -e "AIRFLOW_UID=$(id -u)" > .env
docker compose run airflow-cli airflow config list- Add a
test_hdfs.pyfile underdags/
from datetime import datetime
from airflow.providers.apache.hdfs.hooks.webhdfs import WebHDFSHook
from airflow.sdk import dag, task
# Define the default arguments for the DAG
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 9, 1),
'retries': 1,
}
# Instantiate the DAG
@dag(default_args=default_args, start_date=datetime(2025, 1, 1))
def test_hdfs():
@task()
def list_hdfs_directory():
# Initialize the WebHDFS Hook
hook = WebHDFSHook(webhdfs_conn_id='webhdfs_default')
# Get directory info
res = hook.check_for_path('/airflow')
# Print the result
print(res)
# Set the task order
list_hdfs_task = list_hdfs_directory()
dag1 = test_hdfs()- Run
docker compose up -d - Add a
webhdfsconnection namedwebhdfs_defaultwith host, login, password, and port on Admin > Connections. - Run the
test_hdfstask and get the error messages.
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:providerskind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yetprovider:apache-hdfs