-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix: S3KeySensor deferrable mode ignores metadata_keys, returns only key names, and doesn't pass context to check_fn #56910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
|
I agree though, some test to cover that would be nice |
|
@vincbeck , anything else needed from my side, or should I just wait? Tnx |
|
Can you please rebase? CI is failing due to bugs in main I think |
Fix bug loosing context when defereble is True
Refactor S3 trigger to handle metadata extraction for files.
Added metadata_keys parameter to S3KeyTrigger to specify attributes to gather from head_object.
Added metadata handling to S3KeyTrigger tests.
Refactor S3 trigger tests for clarity and functionality.
Refactor test_run_with_all_metadata to include check_key_async mock and improve async handling.
|
Static checks are failing. Can you please fix them? You can find more information in the documentation. |
|
Static checks are still failing. Have you run |
prek --hook-stage manual mypy-providers --all-files
Run mypy for providers (manual)..........................................Passed |
|
Static checks are now succeeding but one test is failing: |
Fixed. |
|
@vincbeck , from what I saw in the logs, the errors now don’t seem to be related to my changes. |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
|
Merged! Nice job :) |
…key names, and doesn't pass context to check_fn (apache#56910)
This PR fixes two inconsistencies between deferrable and non-deferrable execution modes of the
S3KeySensor:context) is lost when resuming from the trigger.metadata_keysis ignored — the deferrable trigger (S3KeyTrigger) only returns object names (list[str]) instead of metadata dictionaries.Both behaviors cause user-defined
check_fnfunctions to fail or behave inconsistently between execution modes.Reproduction
Use the following DAG:
Observed logs:
filesis a list of strings, andcontextis empty.If
deferrable=False, both contain full metadata and context as expected.Root Cause
S3KeySensor.execute_complete()currently calls:found_keys = self.check_fn(event["files"])Instead of passing
contextlike_check_key()does:S3KeyTriggeremits only:TriggerEvent({"status": "running", "files": keys})wherekeysis alist[str]fromget_files_async(), ignoringmetadata_keys.Fixes Introduced
Fix 1: Pass task
contextproperly tocheck_fninexecute_complete()by replicating the signature inspection logic from_check_key().Fix 2: Add
metadata_keyssupport toS3KeyTrigger, allowing metadata retrieval (list_objects_v2 or head_object) for deferrable mode.Expected Behavior
Both modes (
deferrable=Trueanddeferrable=False) now behave consistently:check_fnalways receives files as a list of metadata dictionaries.contextalways includes task instance and DAG context values (ti, dag_run, etc.).metadata_keys=["*"]returns fullhead_object()data.