Skip to content

Conversation

@tiranux
Copy link
Contributor

@tiranux tiranux commented Jul 4, 2025

resolves: #49801
related: #48107

Support for streaming was added for SFTP_to_GCS Operator. As mentioned in #49801 it is not actually working as the object received in the SFTP hook a BlobWriter.

ERROR - Task failed with exception: source="task"
• TypeError: expected str, bytes or os. PathLike object, not BlobWriter
File "/opt/airflow/task-sdk/src/airflow/sdk/execution_time/task_runner.py", line 875 in run File "/opt/airflow/task-sdk/src/airflow/sdk/execution_time/task_runner.py", line 1162 in _execute_task File "/opt/airflow/task-sdk/src/airflow/sdk/bases/operator.py", line 397 in wrapper
File "/opt/airflow/providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py", line 159 in execute File "/opt/airflow/providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py", line 180 in _copy_single_object
File "/opt/airflow/providers/sftp/src/airflow/providers/sftp/hooks/sftp.py", line 292 in retrieve_file File "/usr/local/lib/python3.10/site-packages/paramiko/sftp_client.py", line 839 in get

While the code can be changed in the SFTP_to_GCS Operator by calling the conn.getfo() directly there, it would break the abstraction. I think we can make the SFTP hook more robust to manage the BlobWriter for GCS and throw an exception for unknown types.


@tiranux
Copy link
Contributor Author

tiranux commented Jul 4, 2025

@VladaZakharova as discussed offline
cc @potiuk

@VladaZakharova
Copy link
Contributor

hi @potiuk !
This PR was reviewed already by our team, just in case :)

@tiranux
Copy link
Contributor Author

tiranux commented Jul 4, 2025

The BlobWriter class can also be imported into the sftp hook and add it in the validation of isinstance:

if isinstance(local_full_path, BytesIO):

which would simplify it a bit. However, that implies importing Google libraries into the hook. I can change if that causes no problem for others but my preference is to avoid that.

@potiuk potiuk merged commit 1e67746 into apache:main Jul 4, 2025
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SFTPToGCSOperator Streaming unexpected BlobWritter

3 participants