-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.10.5
What happened?
In airflow/providers/microsoft/azure/fs/adls.py, the get_fs() function constructs a dictionary of options by pulling connection information from Airflow's connection system and then passes these options to AzureBlobFileSystem(**options).
By default, the function constructs account_url using parse_blob_account_url(conn.host, conn.login), which assumes the Azure Blob endpoint will use the standard core.windows.net domain. While this works for default endpoints, it does not support scenarios where users want to override the domain — for example, when using a private endpoint like .core.mydomain.io.
The root issue is:
account_host(the correct field expected byadlfs.AzureBlobFileSystem) is not included in the list of parsed fields.- Even if the user provides
account_hostin the connection extras,get_fs()ignores it and always constructsaccount_urlusing the hardcoded domain logic. AzureBlobFileSystemdoes not supportaccount_urlas a constructor parameter, so the custom domain is never applied — silently falling back to the default.
As a result, there is no way for users to override the account URL via Airflow connection configuration, even though adlfs.AzureBlobFileSystem supports this through its account_host parameter.
This blocks use cases such as:
- Custom domains
- Private endpoints
- Sovereign or air-gapped cloud regions
This limitation exists even though the underlying library (adlfs) already supports the necessary parameter (account_host).
What you think should happen instead?
What you think should happen instead?
The get_fs() function should support passing a user-defined account_host value from the Airflow connection extras directly to the AzureBlobFileSystem constructor.
Specifically:
- Add
"account_host"to the list of fields extracted fromextras. - If
account_hostis provided, it should be passed directly toAzureBlobFileSystemas a supported parameter. - Currently,
get_fs()setsaccount_urlusingparse_blob_account_url(...), butaccount_urlis not a valid parameter forAzureBlobFileSystem. It can be removed or renamed toaccount_host.
However, to maintain backward compatibility:
- We could retain the existing
account_urllogic as a fallback. - But prefer
account_hostwhen it is explicitly defined in the extras.
This would allow users to configure non-standard Azure Blob endpoints — such as custom domains or private links — via the standard Airflow connection mechanism, while maintaining compatibility with existing deployments.
How to reproduce
-
Create an Airflow
Connectionobject with a custom domain in theextrafield:from airflow.models import Connection from airflow.providers.microsoft.azure.fs.adls import get_fs conn = Connection( conn_id="testconn", conn_type="wasb", login="testaccountname", password="p", host="testaccountID", extra={ "account_name": "n", "tenant_id": "t", "account_host": "https://testaccountname.blob.core.customdomain.io", }, ) # Insert or mock this connection in Airflow metadata
-
Call
get_fs()with this connection ID:fs = get_fs("testconn")
-
Observe that:
- Despite
account_hostbeing set to a custom domain in extras,get_fs()ignores it. - The
optionspassed toadlfs.AzureBlobFileSystemdo not includeaccount_host. - Instead,
get_fs()builds and passes anaccount_urlderived from the default domain based onhostandlogin. - As a result, the custom domain override does not take effect.
- Despite
This confirms the current limitation that account_host cannot be used to override the default Azure Blob endpoint via Airflow’s connection system.
Operating System
MacOS 15.5
Versions of Apache Airflow Providers
apache-airflow-providers-microsoft-azure==12.5.0
Deployment
Astronomer
Deployment details
k8s
Anything else?
-
This is a backward-compatible enhancement since adding support for
account_hostdoes not remove or change existing parameters. -
Supporting
account_hostenables Airflow to better integrate with Azure environments using private endpoints, custom domains, or sovereign clouds. -
The underlying
adlfs.AzureBlobFileSystemalready supportsaccount_host, so this change leverages existing functionality. -
Implementing this will improve user experience and reduce the need for workarounds or custom patches.
-
I want to submit a PR but would appreciate suggestions on the best approach.
-
My current thinking is to simply add
"account_host"to the existingfieldslist inget_fs()so that this block picks it up automatically:fields = [ "account_name", "account_key", "sas_token", "tenant_id", "managed_identity_client_id", "workload_identity_client_id", "workload_identity_tenant_id", "anon", "account_host", # <- add here ] for field in fields: value = get_field(conn_id=conn_id, conn_type=conn_type, extras=extras, field_name=field) if value is not None: if value == "": options.pop(field, "") else: options[field] = value
-
Would this be the preferred way, or are there alternative approaches to consider?
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct