Migrate ADLSListOperator from ADLS Gen1 to Gen2#61188
Conversation
|
Hmmm… it looks like |
|
Okay, fixed! I added |
There was a problem hiding this comment.
Discussion
It seems
AzureDataLakeHookuses the Gen 1 SDK. Perhaps we need to add@deprecated(...)to it?
If Gen 1 SDK is already retired - there's no point in retrospectively deprecating it, as it doesn't serve a practical purpose (in general, it's better to take care of a proper deprecation much before retiring - but someone needs to track these changes on time 🙂).
However, as file_system_name is now mandatory - maybe instead of unexplained TypeError when missing, we should reflect its "sudden" introduction better to the user (use case - people who had used this operator in the past and don't understand why it is now required):
def __init__(
self,
*,
src_adls: str,
dest_gcs: str,
azure_data_lake_conn_id: str,
gcp_conn_id: str = "google_cloud_default",
replace: bool = False,
gzip: bool = False,
google_impersonation_chain: str | Sequence[str] | None = None,
**kwargs,
) -> None:
file_system_name = kwargs.get('file_system_name')
if not file_system_name:
raise TypeError(
"The 'file_system_name' parameter is required. "
"ADLSListOperator has been migrated from Azure Data Lake Storage Gen1 (retired) "
"to Gen2, which requires specifying a file system name. "
"Please add file_system_name='your-container-name' to your operator instantiation."
)WDYT?
CC: @VladaZakharova @MaksYermak (for the GCP transfer operator)
|
Totally agree, that's much clearer. Let me add it! |
|
Wow, I didn't know that, learned something new today! Thanks 😄 |
|
@VladaZakharova @MaksYermak |
closes: #44228
Why
The older
ADLSListOperatorusesAzureDataLakeHook, which uses Gen 1 SDK is already retired.airflow/providers/microsoft/azure/src/airflow/providers/microsoft/azure/hooks/data_lake.py
Line 133 in 44d3678
How
Replace it with
AzureDataLakeStorageV2Hook, which uses Gen 2 SDK.Given Gen1 is retired, the impact should be limited, but this is a breaking change.
What
I created an object (blob) in an Azure Storage account.
And I used this DAG to test whether I could fetch it.
It works pretty well.

Discussion
It seems
AzureDataLakeHookuses the Gen 1 SDK. Perhaps we need to add@deprecated(...)to it?airflow/providers/microsoft/azure/src/airflow/providers/microsoft/azure/hooks/data_lake.py
Line 46 in 44d3678
Was generative AI tooling used to co-author this PR?
Claude Opus 4.5
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.