Skip to content

Conversation

@moiseenkov
Copy link
Contributor

This PR fixes the following case.

The goal is to copy files with prefix source/foo.txt to the folder dest/ within a single GCS bucket.

  1. Create a GCS bucket and upload two files to source directory like this:
gs://my-bucket/source/foo.txt
gs://my-bucket/source/foo.txt.abc
gs://my-bucket/source/foo.txt/subfolder/file.txt
  1. Upload the following DAG to a Cloud Composer environment:
from airflow import DAG
from airflow.providers.google.cloud.transfers.gcs_to_gcs import GCSToGCSOperator
from datetime import datetime

with DAG(
    dag_id="gcs_to_gcs_fail_example",
    schedule_interval=None,
    catchup=False,
    start_date=datetime(2021,1,1)
) as dag:
    copy_file = GCSToGCSOperator(
        task_id="copy_file",
        source_bucket="my-bucket",
        source_object="source/foo.txt",
        destination_object="dest/",
    )
    copy_file
  1. Run the DAG

Expected bucket state:

gs://my-bucket/source/foo.txt
gs://my-bucket/source/foo.txt.abc
gs://my-bucket/source/foo.txt/subfolder/file.txt
gs://my-bucket/dest/foo.txt
gs://my-bucket/dest/foo.txt.abc
gs://my-bucket/dest/foo.txt/subfolder/file.txt

Actual (incorrect) bucket state:

gs://my-bucket/source/foo.txt
gs://my-bucket/source/foo.txt.abc
gs://my-bucket/source/foo.txt/subfolder/file.txt
gs://my-bucket/dest/source/foo.txt
gs://my-bucket/dest/source/foo.txt.abc
gs://my-bucket/dest/source/foo.txt/subfolder/file.txt

@moiseenkov moiseenkov force-pushed the gcs_to_gcs_bugfix2 branch 5 times, most recently from b54f9f9 to ace0da5 Compare July 11, 2023 07:16
@moiseenkov moiseenkov force-pushed the gcs_to_gcs_bugfix2 branch from ace0da5 to 2bd6514 Compare July 11, 2023 08:35
@VladaZakharova
Copy link
Contributor

Hi @potiuk !
Could we please review these changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants