-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Add HttpToGCSOperator for transferring data from HTTP to GCS #49625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. Let make the CI green and merge.
|
Yeah - a breeze test to update as I thought. |
|
Thanks @potiuk ! I'll wait till the rest of test complete and then I'll push the fix to the breeze test |
|
@potiuk Added fix for the Breeze test |
|
Hi @josuegen, |
|
@molcay I did! See attached screenshot |
|
@molcay In case you're wondering what the warning is about. |
|
Hi @josuegen, Thank you for the answer and the screenshots. From the provider perspective; we are OK to merge this PR.
A small note; they might ask for squashing the commits :) |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
…#49625) * Created HTTP to GCS Operator * Created Unit Test for HTTP to GCS Operator * Precommit fixes * Added documentation and test as per pre-commit checks * Removed commits on files made by pre-commit * Fixerd unit test for HTTP to GCS operator * Fixed Unit testing for HTTP to GCS * Fixed unit testing * Fixed unit testing * Updated cross-dependency specification for breeze checks * Fixed provider documentation for HTTP to GCS * Fixed breeze unit testing for HTTP to GCS * Fixed unit testing for HTTP to GCS * Fixed order in selected-providers-list-as-string for breeze test * Fixed unit test for HTTP Hook in HTTPToGCS * Fixed unit test for HTTP Hook in HTTPToGCS * Fixed order in selected-providers-list-as-string for breeze test * Fixed order in selected-providers-list-as-string for breeze test * Removed ORM calls when managing connections in system tests * Added fix for breeze unit test individual-providers-test-types-list-as-strings-in-json * Typo fix for breeze unit test individual-providers-test-types-list-as-strings-in-json --------- Co-authored-by: Josue Velazquez Gen <josuegen@Josues-MacBook-Air.local>
|
@nathadfield you're right! I added these two parameters initially: I'd need to remove them from the |
|
Ok. I would love to actually see these as features for this operator as it would then serve as a drop in replacement for a custom operator we developed a long time ago that does the same thing. |
Why don't you contribute it :)? |
|
@potiuk Yes, I might. |
cool :) |
|
@nathadfield I got some free cycles today and I want to start working on this, did you start already? If not, I'll go ahead and start the work |
|
@josuegen No, I've managed to get around to it so feel free. |
…#49625) * Created HTTP to GCS Operator * Created Unit Test for HTTP to GCS Operator * Precommit fixes * Added documentation and test as per pre-commit checks * Removed commits on files made by pre-commit * Fixerd unit test for HTTP to GCS operator * Fixed Unit testing for HTTP to GCS * Fixed unit testing * Fixed unit testing * Updated cross-dependency specification for breeze checks * Fixed provider documentation for HTTP to GCS * Fixed breeze unit testing for HTTP to GCS * Fixed unit testing for HTTP to GCS * Fixed order in selected-providers-list-as-string for breeze test * Fixed unit test for HTTP Hook in HTTPToGCS * Fixed unit test for HTTP Hook in HTTPToGCS * Fixed order in selected-providers-list-as-string for breeze test * Fixed order in selected-providers-list-as-string for breeze test * Removed ORM calls when managing connections in system tests * Added fix for breeze unit test individual-providers-test-types-list-as-strings-in-json * Typo fix for breeze unit test individual-providers-test-types-list-as-strings-in-json --------- Co-authored-by: Josue Velazquez Gen <josuegen@Josues-MacBook-Air.local>


This PR introduces a new operator,
HttpToGCSOperator, which facilitates the transfer of data from an HTTP endpoint to a Google Cloud Storage (GCS) bucket.Key Features:
HttpHookandGCSHookfor efficient HTTP requests and GCS interactions, with@cached_propertyfor optimized hook instantiation.endpoint,data,headers,bucket_name, andobject_name, allowing for dynamic value injection.http_conn_id,endpoint,method,data,headers,extra_options,log_response,auth_type,tcp_keep_aliverelated parameters.gcp_conn_id,impersonation_chain,bucket_name,object_name,mime_type,gzip,encoding,chunk_size,timeout,num_max_attempts,metadata,cache_control,user_project.Purpose:
This operator simplifies data ingestion from HTTP sources into GCS, which is a common requirement for data pipelines. It eliminates the need for writing custom code to handle HTTP requests and GCS uploads, promoting code reusability and reducing development time.
Example Use Case:
Testing:
The operator has been thoroughly tested with unit tests to ensure its functionality and robustness. (Note: Ideally, reference the specific tests or test file names here if they are in the PR.)
Documentation:
The operator is fully documented with parameter descriptions and usage examples within the code itself.