-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example DAG for demonstrating usage of GCS sensors #22808
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
2a12baf
Add example DAG for demonstrating usage of GCS sensors
957220a
Update airflow/example_dags/example_gcs.py
pankajkoti 11a4fee
Merge branch 'main' into examples-gcs-dag
pankajkoti 65f9fa2
Delete new DAG file and add sensors to existing example_gcs.py
pankajkoti 2779fbc
Update documentation for new sensor examples
pankajkoti bd87323
Update docstring to resolve spell check error in docs generation
pankajkoti f4265c3
Merge branch 'main' into examples-gcs-dag
pankajkoti 3956212
Merge branch 'main' into examples-gcs-dag
pankajkoti 1b94d57
Reorder imports to fix failing Isort static check
pankajkoti c980306
Merge branch 'examples-gcs-dag' of https://github.com/astronomer/airf…
pankajkoti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests shouldn't depend on manual steps - if the file is required we should:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bhirsz I wanted to do upload the file programatically like the other chain. However, the sensor task GCSUploadSessionCompleteSensor waits for change in number of files in the bucket. If we have the upload task before GCSUploadSessionCompleteSensor task, it won't detect any changes. On the other hand, if we add upload task after GCSUploadSessionCompleteSensor task, it would be blocked until GCSUploadSessionCompleteSensor task completes and does not solve the need. I am unsure how to add dependency between such tasks. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky in this case, indeed. How about starting sensors in parallel with upload_file?
Example:
We're starting the sensors and in meantime we're uploading the file - and sensors detect it:
I'm not sure though if it will not be flaky in some cases - running this in CI will show. I will update my PR with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Since, we are starting in parallel, it may happen that, the upload task is picked up before the sensor task begins and it may not detect the change as expected. Hence, added the comment to manually upload the file. :)
Also, the gcs_object_update_sensor_task needs to be activated after gcs_upload_session_complete_task (and not in parallel with it) as the object is expected to be detected by sensor and the object_update task is to confirm the manually upload has happened prior to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can alleviate it a bit with sleep (yes, the old the ugly sleep ;)). Run sensors separably and in the meantime trigger sleep (5s will suffice) and then upload task:
And of course put some explanation in comments why we're doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual steps are unfortunately a big no for us since the plan is to run system tests in CI in community. We can't expect people performing manual task while running them so I'm looking for an automatic solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I was not aware of the fact that we're migrating these example DAGs to tests. I was of the impression that the example DAGs are for references to our community on how to implement DAGs for certain operators. Understand and totally agree to your fair point that tests should not need manual interventions.
Okay, so we can try introducing sleep if we feel right about it. Also, I am new to contributing to Airflow :)
cc: @kaxil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we had two goals with the migration - to make writing system tests easier and actually ensure that are example dags are runnable (what would be point in example that doesn't work, and it's actually often the case ;)). In the old design system tests actually "wrapped" and run examples - but not all example dags were used by system tests. In the new design we're also using examples as system tests, but without separating it to different files and now example dag is the system test itself.
Any change is highly welcome and it's good to see PRs such like yours - congrats! We only missed the notification on PR (sadly Github doesn't allow for an advanced notification system) and we were not able to have this discussion before merging the PR.