Skip to content

No attribute get_password() error when using WebHDFSHook in Airflow 3.0.1 #50756

@roach231428

Description

@roach231428

Apache Airflow Provider(s)

apache-hdfs

Versions of Apache Airflow Providers

apache-airflow-providers-apache-hdfs==4.9.0

Apache Airflow version

3.0.1

Operating System

Ubuntu 22.04

Deployment

Docker-Compose

Deployment details

I deployed the airflow-apiserver and airflow-worker and others on the same machine. I used a Dockerfile to build my airflow images, which use RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /tmp/requirements.txt to install the providers. The base image is apache/airflow:3.0.1.

What happened

When I run a DAG that uses WebHDFSHook, it raises the error AttributeError: 'Connection' object has no attribute 'get_password'. However, I can execute the same code line by line within the worker container without any issues. I'm completely puzzled as to why this error occurs during DAG execution.

The full error log:

AttributeError: 'Connection' object has no attribute 'get_password'

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 838 in run

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1130 in _execute_task

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 408 in wrapper

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py", line 251 in execute

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 408 in wrapper

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 212 in execute

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 235 in execute_callable

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py", line 81 in run

File "/opt/airflow/dags/test/test_hdfs.py", line 22 in list_hdfs_directory

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py", line 154 in check_for_path

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py", line 72 in get_conn

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py", line 91 in _find_valid_server

What you think should happen instead

No error.

How to reproduce

  1. Use the official docker-compose.yaml file and modify the x-airflow-common part
# image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:3.0.1}
  build: .
  1. Create a Dockerfile
FROM apache/airflow:3.0.1

USER root

# Install Kerberos and build tools
RUN apt-get update && \
    apt-get install -y gcc g++ libkrb5-dev krb5-user && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

USER airflow

# Install HDFS provider
RUN pip install apache-airflow-providers-apache-hdfs==4.9.0
  1. Run docker compose build to build images
  2. Setup and initialize the databse
mkdir -p ./dags ./logs ./plugins ./config
echo -e "AIRFLOW_UID=$(id -u)" > .env
docker compose run airflow-cli airflow config list
  1. Add a test_hdfs.py file under dags/
from datetime import datetime
from airflow.providers.apache.hdfs.hooks.webhdfs import WebHDFSHook
from airflow.sdk import dag, task

# Define the default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 9, 1),
    'retries': 1,
}

# Instantiate the DAG
@dag(default_args=default_args, start_date=datetime(2025, 1, 1))
def test_hdfs():

    @task()
    def list_hdfs_directory():
        # Initialize the WebHDFS Hook
        hook = WebHDFSHook(webhdfs_conn_id='webhdfs_default')

        # Get directory info
        res = hook.check_for_path('/airflow')

        # Print the result
        print(res)

    # Set the task order
    list_hdfs_task = list_hdfs_directory()

dag1 = test_hdfs()
  1. Run docker compose up -d
  2. Add a webhdfs connection named webhdfs_default with host, login, password, and port on Admin > Connections.
  3. Run the test_hdfs task and get the error messages.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions