-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Pass proxies config when using ClientSecretCredential in AzureDataLakeStorageV2Hook #37103
Conversation
@Lee-W Thanks for pointing that out, I indeed did forget to update the docstrings, this is ok now. |
…ataLakeStorageV2Hook and added DataToADLSOperator which allows uploading data (e.g. from an XCOM) to a remote file without the need to have a local file created first
56450a6
to
51b319d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a DataToADLSOperator which uses the AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a previous task for example as we had the case) to a remote file without the need to create a local file first.
I am not sure if this is a valid use case (I think we had this discussion before on another operator).
What exactly are we saving by having DataToADLSOperator? xcom data is small so it can't be performance(?)
Lets please remove this part from the PR as it's not related to the bug fix
You can start a separated PR with adding the operator.
Hello @eladkal ok I'll remove it but know that XCom's don't always have to be small. Like in our case we have a custom XCom backend which stores large sets of data (like paged results of a REST API) on a PersistentVolumeClaim, which means only the reference to the file is being stored in the Airflow database and the data itself resides on the filesystem, which is distributed accross the different workers. That way we have operators which retrieves the data and store those results as XCom's, and the consecutive operator, here the DataToADLSOperator takes the result of that XCom and stores it on Fabric. We could of course first write a python task which takes the XCom and writes it down as file before passing it to the Azure operator, but that would mean we need to invoke an additonal worker just to do that, while with this operator you can skip that additional step which also means less code to maintain and a more readable and concise DAG. Also the advantage of this operator is that it used the new AzureDataLakeStorageV2Hook , while the other only available operator used the old hook. There isn't of my knowledge any operator already based on the new AzureDataLakeStorageV2Hook , which if you want to use it, forces you to write custom python code. I'll create a new PR for this DataToADLSOperator operator, the code is very minimal and only delegates to the AzureDataLakeStorageV2Hook. |
Please do. We can discuss there |
This pull request contains a fix when AzureDataLakeStorageV2Hook is used behing a corporate proxy, as the proxies config wasn't passed to the ClientSecretCredential in the AzureDataLakeStorageV2Hook the issue was that the bearer token could not be fetched. Also added a DataToADLSOperator which uses the AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a previous task for example as we had the case) to a remote file without the need to create a local file first. Both the fix and the new operator has unit test and docs where added for the DataToADLSOperator.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.