Return files destination uris in GoogleDriverToGCSOperator and SheetsToGCSOperator#61347
Return files destination uris in GoogleDriverToGCSOperator and SheetsToGCSOperator#61347Ajay9704 wants to merge 6 commits intoapache:mainfrom
GoogleDriverToGCSOperator and SheetsToGCSOperator#61347Conversation
21413d9 to
6a42f67
Compare
667cc9c to
388d3a0
Compare
|
@shahar1 please review when you are free and If there is anything left from my end please let me know |
GoogleDriverToGCSOperator and SheetsToGCSOperator
shahar1
left a comment
There was a problem hiding this comment.
Good job!
Please handle my comments.
Also, please note that in order to merge it we'll need to run the system tests (same goes for other related PRs you've made). If you're able to do it on your own and attach screenshots it will be the best, otherwise we'll need to wait for Google team or someone else to do it.
| gcs_uri = f"gs://{self.destination_bucket}/{gcs_path_to_file}" | ||
| destination_array.append(gcs_uri) | ||
|
|
||
| context["ti"].xcom_push(key="destination_objects", value=destination_array) | ||
|
|
||
| if self.unwrap_single: | ||
| return destination_array[0] if len(destination_array) == 1 else destination_array | ||
| return destination_array |
There was a problem hiding this comment.
We should handle this operator differently - it already returns a list, but the entities are not in the gs URI format. Therefore:
- There's no need for the
unwarp_singleflag - the idea is that after aligning the operators to returnlist[str]by default, we could then deprecate theunwrap_singlewhere used and then users will access the single results usinglst[0]only. - Instead, we should introduce another flag called
return_gcs_uris(see comment here: Return list of GCS URIs from Azure*ToGCS operators #61048 (comment)) - so we won't break backward-compatibility with current behavior. Default should be as it is today (False), and the warning should be that it will later be changed toTrue- you could use the wording from Return list of GCS URIs from Azure*ToGCS operators #61048.
| gcs_uri = f"gs://{self.bucket_name}/{self.object_name}" | ||
| result = [gcs_uri] | ||
|
|
||
| if self.unwrap_single: | ||
| return result[0] | ||
| return result |
There was a problem hiding this comment.
See my comment for the other operator - we could avoid adding unwrap_single parameter at all, and just return [gcs_uri].
@shahar1 Thanks for the review! I'm already working on the changes you mentioned. |
|
@shahar1 Screencast.From.2026-02-06.17-05-02.mp4 |
Yes - try to do it using breeze and Airflow's UI, it should be the easiest - please follow these instructions***:
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'
export GOOGLE_CLOUD_PROJECT='your-gcp-project' # Fill in with GCP project ID
export SYSTEM_TESTS_GCP_PROJECT='your-gcp-project' # Fill in with GCP project ID
export SYSTEM_TESTS_ENV_ID='default'
*** - There are probably better ways to achieve it, but it works for me (and hopefully we could simpify it even further). |
@shahar1 Thanks for reviewing the changes and for the clear guidance. I attempted to run the system tests using Breeze as instructed, but I am consistently running into an issue where the Airflow services do not start. PostgreSQL initializes successfully, but the webserver, scheduler, triggerer, and API server never come up. I reinstalled Breeze, restarted multiple times, and re-checked my environment, but the problem persists and appears to be related to my local Docker/Breeze setup. Because of this, I haven't been able to run the system tests locally. Could you advise on how to resolve this Breeze startup issue, or let me know if running system tests in CI would be acceptable for this PR? If unit tests demonstrating the behavior are sufficient, I can proceed accordingly as well. Any guidance you can provide would be very helpful. |
What error message/logs do you get? Do you run it using |
Description
This PR implements consistent GCS destination URI return behavior for GoogleDriveToGCSOperator and GoogleSheetsToGCSOperator
Changes
GoogleDriveToGCSOperator(transfers/gdrive_to_gcs.py)Added unwrap_single parameter to control return format (default: True)
Returns full GCS URIs in gs://bucket/object format instead of None
Added deprecation warning for future default behavior change
Maintains backward compatibility with existing XCom functionality
GoogleSheetsToGCSOperator(transfers/sheets_to_gcs.py)Added unwrap_single parameter to control return format (default: True)
Returns full GCS URIs in gs://bucket/object format instead of object names only
Added deprecation warning for future default behavior change
Preserves existing XCom push behavior for destination objects
Implementation Details
Both operators now follow the consistent return convention:
When unwrap_single=True (default): Returns single string for one file, list for multiple files
When unwrap_single=False: Always returns list regardless of file count
Full GCS URI format: All return values include gs:// prefix
Backward compatibility: Existing code continues to work unchanged
Tests
Added comprehensive test coverage for both operators:
test_execute_with_unwrap_single_true - Single file return behavior
test_execute_with_unwrap_single_false - List return behavior
test_execute_with_unwrap_single_default - Default behavior verification
GoogleSheetsToGCSOperator: Additional tests for single vs multiple file scenarios
Related Issues
Part of issue #11323 - Return files destination URIs in all ToGCS operators
Migration Notes
Current behavior (backward compatible):
python
/// Returns single URI string when one file, list when multiple
op = GoogleDriveToGCSOperator(...)
result = op.execute(context) # e.g., "gs://bucket/file.txt" or ["gs://bucket/file1.txt", "gs://bucket/file2.txt"]
Future behavior (prepare now):
python
/// Explicitly set unwrap_single to avoid future breaking changes
op = GoogleDriveToGCSOperator(unwrap_single=False, ...)
result = op.execute(context) # Always returns list: ["gs://bucket/file.txt"]
Checklist
Added unwrap_single parameter with proper type hints
Implemented deprecation warning for future default change
Updated return values to full GCS URI format (gs://bucket/object)
Maintained backward compatibility with existing XCom behavior
Added comprehensive unit tests for all scenarios
Updated docstrings with new parameter documentation
Verified system test examples remain compatible
Related: #11323
Used AI for resolving conflicts