Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Data Lake connection will not work for blob.core.windows.net domain #44228

Open
1 of 2 tasks
vince-vanh opened this issue Nov 20, 2024 · 1 comment
Open
1 of 2 tasks
Labels
area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:microsoft-azure Azure-related issues

Comments

@vince-vanh
Copy link

Apache Airflow Provider(s)

microsoft-azure

Versions of Apache Airflow Providers

apache-airflow-providers-microsoft-azure 11.1.0

Apache Airflow version

2.9.2

Operating System

Ubuntu 22.04.4

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

Scenario: need to leverage Azure storage for Airflow remote logging.

Step 1 is verifying the connection works, so I'm using the operator ADLSListOperator as a test case.
On the connector I have set the following properties:
Azure Client ID:
Azure Client Secret:
Azure Tenant ID:
Azure DataLake Store Name: <e.g. mystorageaccount>

The store name's fully qualified url is https://mystorageaccount.blob.core.windows.net/

I know the client id, secret, and tenant id are all valid. They match the credentials that successfully work against the storage account using the python operator and the azure.storage.blob library. If I try to leverage the ADLS Connection with ADLSListOperator from apache-airflow-providers-microsoft-azure (11.1.0), it fails. The error log seems to indicate it is trying to connect to the wrong domain - e.g. ConnectionError(MaxRetryError("HTTPSConnectionPool(host='none.azuredatalakestore.net'

The domain azuredatalakestore.net is for legacy azure storage accounts. New storage accounts cannot use this domain. All future storage accounts use blob.core.windows.net.

If anyone has successfully used the operator ADLSListOperator against a storage account hosted at blob.core.windows.net, I'd be curious to know the configuration used. The documentation and examples I've found are very sparse or inconsistent.

I've tried using connector types azure_data_lake (as described above) as well as types adls and wasb.

What you think should happen instead

I would exect ADLSListOperator to list files, but it times out. I assume because it is trying to connect to the wrong domain.

How to reproduce

  1. Create a valid azure storage account that uses the blob.core.windows.net domain - which should be all new storage accounts on Azure.
  2. Setup a azure_data_lake connection using valid client id, client secret, tenant id, and account name.
  3. Write a DAG that leverages the ADLSListOperator.

Anything else

Always. Hasn't worked successfully yet.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@vince-vanh vince-vanh added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Nov 20, 2024
Copy link

boring-cyborg bot commented Nov 20, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the provider:microsoft-azure Azure-related issues label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:microsoft-azure Azure-related issues
Projects
None yet
Development

No branches or pull requests

1 participant