Skip to content

Allow check_fn of S3KeySensor to receive bucket key #44896

@tuzonghua

Description

@tuzonghua

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.28.0

Apache Airflow version

2.9.3

Operating System

macOS 15.1.1

Deployment

Google Cloud Composer

Deployment details

No response

What happened

When using the check_fn function in S3KeySensor, there's no way for the function to check against a specific object name. The only available keys are what's provided in the S3 head_object API call, which doesn't include the prefix or object name itself.

What you think should happen instead

If check_fn takes in a list of file sizes, it should also map the S3 key to the file size so there's flexibility in how to filter the list.

How to reproduce

If there is a bucket with the following objects:

$ aws s3 ls s3://test-bucket/path/to/some/files
2024-12-11 20:09:12   18348549 000000_0-hadoop_20241212010840_abcdef.gz
2024-12-11 20:09:14   16543931 000001_0-hadoop_20241212010840_sadfjwij.gz
2024-12-11 20:09:49          0 _SUCCESS

and S3KeySensor:

def check_for_file_in_s3 = S3KeySensor(
        task_id="check_for_file_in_s3",
        soft_fail=True,
        mode="reschedule",
        poke_interval=0,
        timeout=0, 
        bucket_name="test-bucket",
        bucket_key=[
            "path/to/some/files/_SUCCESS", 
            "path/to/some/files/000000_0-hadoop_*"
        ],
        aws_conn_id="spend327_aws_connection",
        retries=0,=
        wildcard_match=True,
        check_fn=check_fn
    )

then the following check_fn will never succeed:

def check_fn(files: list, **kwargs: Any) -> bool:
    """
    Check that the data file is greater than 0.5 megabyte

    :param files: List of S3 object attributes.
    :return: true if the criteria is met
    """
    for file in files:
        if "hadoop" in file:
            return file.get("Size", 0) > 524288
        elif "SUCCESS" in file:
            return True
        else:
            return False

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions