Skip to content

Conversation

@AlexisBRENON
Copy link
Contributor

Split the execute method to move the "file listing" retrieval to a dedicated method.
This allow to apply arbitrary modification of file listing more easily.

My use case is to sample the file listing (to avoid to copy too much file on a run, just copy 1/1000).

With the previous implementation, I had to override the whole execute method, to be able to manipulate the s3_objects list before passing it to the transfer methods.

class MyOperator(S3ToGcsOperator):

    def execute(...):
        # [...]
        s3_objects = S3ListOperator.execute(self, context)  # Update here to call the right method
        s3_objects = sample(s3_objects)  # Add my sampling method

        # [...] Copy the rest of the method, keeping it in sync with upstream

With the new implementation, I can only override the _get_files one, mostly calling the super method and just modify the list before return.

class MyOperator(S3ToGcsOperator):

    def _get_files(...):
        s3_objects = super()._get_files(...)
        return sample(s3_objects)  # Add my sampling method

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Apr 25, 2025
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static check needs to be fixed.
You can quickly fix it by running:
pre-commit run ruff --all && pre-commit run ruff-format --all.

@jason810496 jason810496 requested a review from Lee-W April 29, 2025 13:04
@shahar1 shahar1 merged commit c3a4e8e into apache:main May 12, 2025
68 checks passed
sanederchik pushed a commit to sanederchik/airflow that referenced this pull request Jun 7, 2025
* refactor: allow to edit the file listings before copy

* fix static check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants