Skip to content

Comments

Return list of GCS URIs from Azure*ToGCS operators#61048

Merged
shahar1 merged 8 commits intoapache:mainfrom
nailo2c:fix-11323_azure_operator_return_uri
Feb 10, 2026
Merged

Return list of GCS URIs from Azure*ToGCS operators#61048
shahar1 merged 8 commits intoapache:mainfrom
nailo2c:fix-11323_azure_operator_return_uri

Conversation

@nailo2c
Copy link
Contributor

@nailo2c nailo2c commented Jan 25, 2026

Related: #11323

How

Refactor AzureBlobStorageToGCSOperator and AzureFileShareToGCSOperator to return lists of GCS URIs instead of strings.


Was generative AI tooling used to co-author this PR?
  • No (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

…r to return lists of GCS URIs instead of strings
@nailo2c nailo2c requested a review from shahar1 as a code owner January 25, 2026 21:58
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, please be careful :)

@nailo2c nailo2c requested a review from shahar1 January 26, 2026 16:10
@shahar1
Copy link
Contributor

shahar1 commented Jan 26, 2026

Before approving and merging (as technically it looks good!) - I need some time to consider how we're going to handle it across the operators in terms of naming the "as_list" flag (should be more generic than return_gcs_uris IMO), as it should be consistent. Maybe even the default behavior for single entities is worthy of a discussion.
I'm certain that we had use cases like that in the past, probably in operators of different providers, so it needs some exploration (you're welcome to do it on your own as well).
It does make sense to me that it will return the list by default, and upon deprecation we'll use a flag to return the single entity.

If you have some insights about it, feel free to share.

@nailo2c
Copy link
Contributor Author

nailo2c commented Jan 26, 2026

I was torn between return_list and as_list, and I went with as_list because I think it aligns better with the issue's intent and is more generic.

@shahar1
Copy link
Contributor

shahar1 commented Jan 27, 2026

I was torn between return_list and as_list, and I went with as_list because I think it aligns better with the issue's intent and is more generic.

I think that I'm good with this -
I want to check some past similar work and ensure that the rest of the transfer operators could be aligned with this naming.
Give me a couple of days to check, you could work in the meanwhile on other issues (or even other operators as part of the original issue).

@nailo2c
Copy link
Contributor Author

nailo2c commented Jan 27, 2026

Sounds good! Let me know once you've finished the survey :)

@shahar1
Copy link
Contributor

shahar1 commented Jan 31, 2026

Sounds good! Let me know once you've finished the survey :)

OK, found it - we have unwrap_single for KubernetesJobOperator, which is even better than as_list as the latter is meaningless for multiple results.
Let's do it as follows:

  1. Rename the parameter to unwrap_single - default for now should be True to retain backward compatibility.
  2. If the parameter is not provided at all, raise the deprecation warning that this parameter will be set to default False in future release which will be a breaking change. If it's explicitly provided (either True or False), it means that the user is acknowledged of this parameter and fixated its value, so no need to warn them.

I'll update the instructions in the original issue.

@nailo2c nailo2c force-pushed the fix-11323_azure_operator_return_uri branch from 988d333 to cf297b2 Compare January 31, 2026 22:42
@nailo2c
Copy link
Contributor Author

nailo2c commented Jan 31, 2026

If the parameter is not provided at all, raise the deprecation warning that this parameter will be set to default False in future release which will be a breaking change.

Hi @shahar1, great proposal, thanks for digging up the precedent. One implementation detail: if we define it as unwrap_single=True, we can't distinguish "omitted" from "explicitly passed True", so we can't emit the deprecation warning only when the parameter is not provided.

How about putting the deprecation warning under the unwrap_single=True block?

@shahar1
Copy link
Contributor

shahar1 commented Feb 1, 2026

If the parameter is not provided at all, raise the deprecation warning that this parameter will be set to default False in future release which will be a breaking change.

Hi @shahar1, great proposal, thanks for digging up the precedent. One implementation detail: if we define it as unwrap_single=True, we can't distinguish "omitted" from "explicitly passed True", so we can't emit the deprecation warning only when the parameter is not provided.

How about putting the deprecation warning under the unwrap_single=True block?

You could let it to be optional (None), and if it's None (i.e., user has no set it explicitly) - then set it to True + raise a FutureWarning (there's another PR that implemented it, feel free to take inspiration)

…r to use unwrap_single parameter for GCS URI returns; update tests accordingly
@nailo2c
Copy link
Contributor Author

nailo2c commented Feb 1, 2026

Thanks for the hint, the code looks much better now! I kept return_gcs_uris in azure_fileshare_to_gcs.py because the original return is not a single unwrapped string - it returns a list of filenames, so unwrap_single doesn't quite fit here.

I tweaked the code to avoid a breaking change while staying more aligned with the other operator. I think it'll also be easier to remove in the future if we want to :)

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Aaron!
And good call regarding usage of return_gcs_uris where applicable.
I'll let Google team a couple of days to comment before merging.

CC: @VladaZakharova @MaksYermak

@shahar1
Copy link
Contributor

shahar1 commented Feb 6, 2026

@nailo2c are you able to run the system tests for the modified operators and attach screenshots that they pass? (maybe also a screenshot that shows the outputs)
Following recent convesrations with Google provider team, it is necessary to ensure that we don't have issues there.
If you're not to do it on your own, we'll have to wait until Google team or someone else will do it (I might do it on my own eventually if no one else does, but it will probably be faster if someone who's familiar with Azure and GCS altogether does it).

@nailo2c
Copy link
Contributor Author

nailo2c commented Feb 6, 2026

Hi @shahar1, sure :) but let me confirm the system tests are example_azure_blob_to_gcs.py and example_azure_fileshare_to_gcs.py, right?

If so, here are the results:

example_azure_blob_to_gcs.py

  • It works well on Airflow (the first two failures were due to my environment misconfigurations, not code issues).
example_azure_blob_to_gcs_1_airflow
  • It reads the file from Azure Blob Storage and transfers it to GCS successfully.
example_azure_blob_to_gcs_2_azure example_azure_blob_to_gcs_3_gcp
  • It's not a breaking change-- when unwrap_single is not set, it returns a string as before.
example_azure_blob_to_gcs_4_old_return_str
  • After setting unwrap_single=False, it returns a list of URIs.
example_azure_blob_to_gcs_5_new_return_list

example_azure_fileshare_to_gcs.py

  • Same for example_azure_fileshare_to_gcs.py
sync_azure_files_with_gcs
  • Also, while running the system test, I noticed dest_gcs in example_azure_fileshare_to_gcs.py was missing the gs:// prefix, which causes _parse_gcs_url to fail. I fixed it incidentally.

Please let me know if there's anything else I can help with :)

@shahar1
Copy link
Contributor

shahar1 commented Feb 7, 2026

Hi @shahar1, sure :) but let me confirm the system tests are example_azure_blob_to_gcs.py and example_azure_fileshare_to_gcs.py, right?

If so, here are the results:

example_azure_blob_to_gcs.py

  • It works well on Airflow (the first two failures were due to my environment misconfigurations, not code issues).
example_azure_blob_to_gcs_1_airflow * It reads the file from Azure Blob Storage and transfers it to GCS successfully.

example_azure_blob_to_gcs_2_azure example_azure_blob_to_gcs_3_gcp

  • It's not a breaking change-- when unwrap_single is not set, it returns a string as before.
example_azure_blob_to_gcs_4_old_return_str * After setting `unwrap_single=False`, it returns a list of URIs. example_azure_blob_to_gcs_5_new_return_list # example_azure_fileshare_to_gcs.py * Same for `example_azure_fileshare_to_gcs.py` sync_azure_files_with_gcs * Also, while running the system test, I noticed `dest_gcs` in `example_azure_fileshare_to_gcs.py` was missing the `gs://` prefix, which causes `_parse_gcs_url` to fail. I fixed it incidentally.

Please let me know if there's anything else I can help with :)

Awesome! It seems that the system tests are indeed useful for detecting such "last minute" bugs.
Now that everything passes I think that we're all set for this PR to be merged.
If no objections are made, it will be merged by the upcoming release on Tuesday.
Feel free to work on other issues in the meanwhile. Also, if you're able to run system tests for other approved PRs related to the original issue, where we still don't have any - it would be very helpful.

CC: @VladaZakharova @MaksYermak

@shahar1 shahar1 merged commit 0a4436c into apache:main Feb 10, 2026
90 checks passed
Ratasa143 pushed a commit to Ratasa143/airflow that referenced this pull request Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants