-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added MultipleFilesWebHdfsSensor #43045
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, otherwise LGTM
providers/src/airflow/providers/apache/hdfs/sensors/web_hdfs.py
Outdated
Show resolved
Hide resolved
Great first PR! |
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
This PR has caused failures both static and tests. I am fixing them in #43122
|
This was an interesting one - seems that for some reason the CI workflow DID NOT run at all - only build image workflow did. that's why it was "green". |
This is a long known issue with GitHub that I raised to them 3 years ago - unfortunately there is a race condition that makes the PR "green" if the workflow have not started at all, or when it is just starting.... Very poor design IMHO for GitHub Actions @kaxil @romsharon98 The only way I found to prevent such accidental merges of "green-but-incomplete" PRs is to look at the number of checks that "passed". When there are < 10, something is WRONG. But it's not really obvious and happened to me more than once to merge such PR. |
I added
MultipleFilesWebHdfsSensor
class inproviders.apache.hdfs.sensors.web_hdfs
.The current existing
WebHdfsSensor
can check if one file exists, which requires many tasks to check many files (in my org we had 350+ sensors for a single DAG).The new
MultipleFilesWebHdfsSensor
can list a whole directory and succeeds only when all the expected files landed in the hdfs.This is my first contribution so I would greatly appreciate any guidance :)