-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Closed
Labels
area:providersgood first issuekind:featureFeature RequestsFeature Requestsprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issues
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.28.0
Apache Airflow version
2.9.3
Operating System
macOS 15.1.1
Deployment
Google Cloud Composer
Deployment details
No response
What happened
When using the check_fn function in S3KeySensor, there's no way for the function to check against a specific object name. The only available keys are what's provided in the S3 head_object API call, which doesn't include the prefix or object name itself.
What you think should happen instead
If check_fn takes in a list of file sizes, it should also map the S3 key to the file size so there's flexibility in how to filter the list.
How to reproduce
If there is a bucket with the following objects:
$ aws s3 ls s3://test-bucket/path/to/some/files
2024-12-11 20:09:12 18348549 000000_0-hadoop_20241212010840_abcdef.gz
2024-12-11 20:09:14 16543931 000001_0-hadoop_20241212010840_sadfjwij.gz
2024-12-11 20:09:49 0 _SUCCESS
and S3KeySensor:
def check_for_file_in_s3 = S3KeySensor(
task_id="check_for_file_in_s3",
soft_fail=True,
mode="reschedule",
poke_interval=0,
timeout=0,
bucket_name="test-bucket",
bucket_key=[
"path/to/some/files/_SUCCESS",
"path/to/some/files/000000_0-hadoop_*"
],
aws_conn_id="spend327_aws_connection",
retries=0,=
wildcard_match=True,
check_fn=check_fn
)then the following check_fn will never succeed:
def check_fn(files: list, **kwargs: Any) -> bool:
"""
Check that the data file is greater than 0.5 megabyte
:param files: List of S3 object attributes.
:return: true if the criteria is met
"""
for file in files:
if "hadoop" in file:
return file.get("Size", 0) > 524288
elif "SUCCESS" in file:
return True
else:
return FalseAnything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:providersgood first issuekind:featureFeature RequestsFeature Requestsprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issues