Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Pass proxies config when using ClientSecretCredential in AzureDataLakeStorageV2Hook #37103

Merged
merged 15 commits into from
Mar 1, 2024

Conversation

dabla
Copy link
Contributor

@dabla dabla commented Jan 30, 2024

This pull request contains a fix when AzureDataLakeStorageV2Hook is used behing a corporate proxy, as the proxies config wasn't passed to the ClientSecretCredential in the AzureDataLakeStorageV2Hook the issue was that the bearer token could not be fetched. Also added a DataToADLSOperator which uses the AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a previous task for example as we had the case) to a remote file without the need to create a local file first. Both the fix and the new operator has unit test and docs where added for the DataToADLSOperator.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@dabla
Copy link
Contributor Author

dabla commented Feb 8, 2024

@Lee-W Thanks for pointing that out, I indeed did forget to update the docstrings, this is ok now.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a DataToADLSOperator which uses the AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a previous task for example as we had the case) to a remote file without the need to create a local file first.

I am not sure if this is a valid use case (I think we had this discussion before on another operator).
What exactly are we saving by having DataToADLSOperator? xcom data is small so it can't be performance(?)

Lets please remove this part from the PR as it's not related to the bug fix
You can start a separated PR with adding the operator.

@dabla
Copy link
Contributor Author

dabla commented Feb 29, 2024

Also added a DataToADLSOperator which uses the AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a previous task for example as we had the case) to a remote file without the need to create a local file first.

I am not sure if this is a valid use case (I think we had this discussion before on another operator). What exactly are we saving by having DataToADLSOperator? xcom data is small so it can't be performance(?)

Lets please remove this part from the PR as it's not related to the bug fix You can start a separated PR with adding the operator.

Hello @eladkal ok I'll remove it but know that XCom's don't always have to be small. Like in our case we have a custom XCom backend which stores large sets of data (like paged results of a REST API) on a PersistentVolumeClaim, which means only the reference to the file is being stored in the Airflow database and the data itself resides on the filesystem, which is distributed accross the different workers. That way we have operators which retrieves the data and store those results as XCom's, and the consecutive operator, here the DataToADLSOperator takes the result of that XCom and stores it on Fabric. We could of course first write a python task which takes the XCom and writes it down as file before passing it to the Azure operator, but that would mean we need to invoke an additonal worker just to do that, while with this operator you can skip that additional step which also means less code to maintain and a more readable and concise DAG. Also the advantage of this operator is that it used the new AzureDataLakeStorageV2Hook , while the other only available operator used the old hook. There isn't of my knowledge any operator already based on the new AzureDataLakeStorageV2Hook , which if you want to use it, forces you to write custom python code. I'll create a new PR for this DataToADLSOperator operator, the code is very minimal and only delegates to the AzureDataLakeStorageV2Hook.

@eladkal
Copy link
Contributor

eladkal commented Mar 1, 2024

I'll create a new PR for this DataToADLSOperator operator

Please do. We can discuss there

@eladkal eladkal changed the title Pass proxies config when using ClientSecretCredential in AzureDataLakeStorageV2Hook fix: Pass proxies config when using ClientSecretCredential in AzureDataLakeStorageV2Hook Mar 1, 2024
@eladkal eladkal merged commit 87a249a into apache:main Mar 1, 2024
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants