Skip to content

account_host missing in get_fs() in adls.py for AzureBlobFileSystem Body #53333

@samuelkhtu

Description

@samuelkhtu

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.10.5

What happened?

In airflow/providers/microsoft/azure/fs/adls.py, the get_fs() function constructs a dictionary of options by pulling connection information from Airflow's connection system and then passes these options to AzureBlobFileSystem(**options).

By default, the function constructs account_url using parse_blob_account_url(conn.host, conn.login), which assumes the Azure Blob endpoint will use the standard core.windows.net domain. While this works for default endpoints, it does not support scenarios where users want to override the domain — for example, when using a private endpoint like .core.mydomain.io.

The root issue is:

  • account_host (the correct field expected by adlfs.AzureBlobFileSystem) is not included in the list of parsed fields.
  • Even if the user provides account_host in the connection extras, get_fs() ignores it and always constructs account_url using the hardcoded domain logic.
  • AzureBlobFileSystem does not support account_url as a constructor parameter, so the custom domain is never applied — silently falling back to the default.

As a result, there is no way for users to override the account URL via Airflow connection configuration, even though adlfs.AzureBlobFileSystem supports this through its account_host parameter.

This blocks use cases such as:

  • Custom domains
  • Private endpoints
  • Sovereign or air-gapped cloud regions

This limitation exists even though the underlying library (adlfs) already supports the necessary parameter (account_host).

What you think should happen instead?

What you think should happen instead?

The get_fs() function should support passing a user-defined account_host value from the Airflow connection extras directly to the AzureBlobFileSystem constructor.

Specifically:

  • Add "account_host" to the list of fields extracted from extras.
  • If account_host is provided, it should be passed directly to AzureBlobFileSystem as a supported parameter.
  • Currently, get_fs() sets account_url using parse_blob_account_url(...), but account_url is not a valid parameter for AzureBlobFileSystem. It can be removed or renamed to account_host.

However, to maintain backward compatibility:

  • We could retain the existing account_url logic as a fallback.
  • But prefer account_host when it is explicitly defined in the extras.

This would allow users to configure non-standard Azure Blob endpoints — such as custom domains or private links — via the standard Airflow connection mechanism, while maintaining compatibility with existing deployments.

How to reproduce

  1. Create an Airflow Connection object with a custom domain in the extra field:

    from airflow.models import Connection
    from airflow.providers.microsoft.azure.fs.adls import get_fs
    
    conn = Connection(
        conn_id="testconn",
        conn_type="wasb",
        login="testaccountname",
        password="p",
        host="testaccountID",
        extra={
            "account_name": "n",
            "tenant_id": "t",
            "account_host": "https://testaccountname.blob.core.customdomain.io",
        },
    )
    # Insert or mock this connection in Airflow metadata
  2. Call get_fs() with this connection ID:

    fs = get_fs("testconn")
  3. Observe that:

    • Despite account_host being set to a custom domain in extras, get_fs() ignores it.
    • The options passed to adlfs.AzureBlobFileSystem do not include account_host.
    • Instead, get_fs() builds and passes an account_url derived from the default domain based on host and login.
    • As a result, the custom domain override does not take effect.

This confirms the current limitation that account_host cannot be used to override the default Azure Blob endpoint via Airflow’s connection system.

Operating System

MacOS 15.5

Versions of Apache Airflow Providers

apache-airflow-providers-microsoft-azure==12.5.0

Deployment

Astronomer

Deployment details

k8s

Anything else?

  • This is a backward-compatible enhancement since adding support for account_host does not remove or change existing parameters.

  • Supporting account_host enables Airflow to better integrate with Azure environments using private endpoints, custom domains, or sovereign clouds.

  • The underlying adlfs.AzureBlobFileSystem already supports account_host, so this change leverages existing functionality.

  • Implementing this will improve user experience and reduce the need for workarounds or custom patches.

  • I want to submit a PR but would appreciate suggestions on the best approach.

  • My current thinking is to simply add "account_host" to the existing fields list in get_fs() so that this block picks it up automatically:

    fields = [
        "account_name",
        "account_key",
        "sas_token",
        "tenant_id",
        "managed_identity_client_id",
        "workload_identity_client_id",
        "workload_identity_tenant_id",
        "anon",
        "account_host",  # <- add here
    ]
    for field in fields:
        value = get_field(conn_id=conn_id, conn_type=conn_type, extras=extras, field_name=field)
        if value is not None:
            if value == "":
                options.pop(field, "")
            else:
                options[field] = value
  • Would this be the preferred way, or are there alternative approaches to consider?

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions