-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync v2-1-stable
and v2-1-test
to release 2.1.4
#18163
Conversation
* Improve cross-links to operators and hooks references * fixup! Improve cross-links to operators and hooks references * Update docs/apache-airflow/concepts/operators.rst (cherry picked from commit cff5f18)
currently it shows up as: ``` apache/airflow:|version| - the versioned Airflow image with default Python version (3.6 currently) ``` Example: http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/docker-stack/index.html or even https://airflow.apache.org/docs/docker-stack/index.html This commit fixes it (cherry picked from commit 8cdda20)
The providers operators/hooks reference contained only top-level list of groups of providers, which make them less-usable than they could be as the users did not see at this page links to particular operators/hooks, it was not really visible what is "available" (discoverability) and the more detailed "Service" and "Transfer" pages are not really readable enough to give "at a glance" overview what is available. This change improves that, removes the repeated multiple times "operators and hooks" which was kind of annoying, and increases the TOC-level to 3 giving a nice overview of all available and exposed operator and hooks. (cherry picked from commit c7f37a9)
The ``hook-class-names`` provider's meta-data property has been deprecated and is now replaced by ``connection-types`` property. This documents the change. (cherry picked from commit be75dcd)
The documentation of provider packages was rather disconnected from the apache-airlfow documentation. It was hard to find the ways how the apache airflow's core extensions are implemented by the community managed providers - you needed to know what you were looking for, and you could not find links to the summary of the core-functionality extended by providers when you were looking at the functionality (like logging/secret backends/connections/auth) This PR inroduces much more comprehensive cross-linking between the airflow core functionalithy and the community-managed providers that are providing extensions to the core functionality. (cherry picked from commit bcc7665)
The #17939 did not fix the problem finally. It turned out that one more change was needed - since we now always upgrade to latest dependencies in `push` and `schedule` type of build we do not need to check for the variable UPGRADE_TO_NEWER_DEPENDENCIES (which was not set in "Build Image" step. This fixes it, but also changes the constraint generation to add comments in the generated constraint files, describing how and why the files are generated. (cherry picked from commit bec006e)
) The Top-Level best practices were a little misleading. They suggested that no code should be written at the top-level DAG other than just creating operators, but the story is a little more nuanced. Better explanation is give and also examples on how you can deal with the situation when you need to generate your data based on some meta-data. From Slack discussion it seems that it is not obvious at all what are the best ways to handle that so two alternatives were presented with generating a meta-data file and generating an importable python code containing the meta-data. During that change, I noticed also, that config sections and config variables were not sorted - which made it very difficult to search for them in the index. All the config variables are now sorted so the references to the righ sections/variables make much more sense now. (cherry picked from commit 1be3ef6)
The automated upgrade of dependencies in main broken building of Airflow documentation in main build. After a lot of experimentation, It has been narrowed down to upgrade of dnspython from 1.16.0 to 2.+ which was brought by upgrading eventlet to 0.32.0. This PR limits the dnspython library to < 2.0.0. An issue has been opened: rthalley/dnspython#681 (cherry picked from commit 022b4e0)
As of August 2021, the buster-slim python images, no longer contain python2 packages. We still support running Python2 via PythonVirtualenvOperator and our tests started to fail when we run the tests in `main` - those tests always pull and build the images using latest-available buster-slim images. Our system to prevent PR failures in this case has proven to be useful - the main tests failed to succeed so the base images we have are still using previous buster-slim images which still contain Python 2. This PR adds python2 to installed packages - on both CI images and PROD images. For CI images it is needed to pass tests, for PROD images, it is needed for backwards-compatibility. (cherry picked from commit 6898a2f)
We are now generatnung constraints with better description, and we include information about DEFAULT_BRANCH (main/v2-1-test etc.) The scripts to generate the constraints need to get teh variable passed to docker. Also names of generated files were wrong. The constraints did not update the right constraint files. (cherry picked from commit afd4ba6)
Since we released Celery provider with celery 5, we should limit celery to < 5 for Airlfow 2.1 EAGER_UPGRADE limits. EAGER_UPGRADE limits are only used during constraint generation.
When we have a prolonged issue with flaky tests or Github runners instabilities, our automated constraint and image refresh might not work, so we might need to manually refresh the constraints and images. Documentation about that was in CONTRIBUTING.rst but it is more appriate to keep it in ``dev`` as it only applies to committers. Also during testing the parallell refresh without delays an error was discovered which prevented parallell check of random image hash during the build. This has been fixed and parallell image cache building should work flawlessly now. (cherry picked from commit 36c5fd3)
When the DAG appear again in the UI and we rerun it, say we have catchup set to True, those running task instances that were not deleted would be rerun and an external state change of the task instances would be detected by the LocalTaskJob thereby sending SIGTERM to the task runner This change resolves this by making sure that DAGs are not deleted when the task instances are still running (cherry picked from commit 5a64c1c)
We were not passing the root to the `/tree_data` api call. Therefore, filtering upstream of a task would be reset during auto-refresh even though root was still defined. (cherry picked from commit c645d7a)
(cherry picked from commit 96f7e3f)
* Only draw once during initial graph setup The previous behavior could cause significat slowness for when loading the graph view for large dags with many task groups. * Improve name and fix camelCased * Fix indent * PR suggestions remove args (cherry picked from commit bfdda08)
The "color" method seems to have been removed. (cherry picked from commit a1d9172)
(cherry picked from commit 683fbd4)
Currently, tasks can be run even if the dagrun is queued. Task instances of queued dagruns should only be run when the dagrun is in running state. This PR makes sure tis of queued dagruns are not run thereby properly checking task concurrency. Also, we check max_active_runs when parsing dag which is no longer needed since dagruns are created in queued state and the scheduler controls when to change the queued dagruns to running considering the max_active_runs. This PR removes the checking of max_active_runs in the dag too. (cherry picked from commit ffb81ea)
This hides the variable import form if the user does not have the "can create on variable" permission. (cherry picked from commit 7b3a5f9)
The graph view should show the "Download Log" and "View Logs in {remote logging system}", like is done on the tree view. (cherry picked from commit 83f1f07)
(cherry picked from commit 6868ca4)
The way how dumb-init propagated the signal by default made celery worker not to handle termination well. Default behaviour of dumb-init is to propagate signals to the process group rather than to the single child it uses. This is protective behaviour, in case a user runs 'bash -c' command without 'exec' - in this case signals should be sent not only to the bash but also to the process(es) it creates, otherwise bash exits without propagating the signal and you need second signal to kill all processes. However some airflow processes (in particular airflow celery worker) behave in a responsible way and handles the signals appropriately - when the first signal is received, it will switch to offline mode and let all workers terminate (until grace period expires resulting in Warm Shutdown. Therefore we can disable the protection of dumb-init and let it propagate the signal to only the single child it spawns in the Helm Chart. Documentation of the image was also updated to include explanation of signal propagation. For explicitness the DUMB_INIT_SETSID variable has been set to 1 in the image as well. Fixes #18066 (cherry picked from commit 9e13e45)
Regression on PID reset to allow task start after heartbeat Co-authored-by: Nicolas MEHRAEIN <nicolas.mehraein@adevinta.com> (cherry picked from commit ed99eaa)
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
The recently updated docker-compose had a bit broken behaviour for non-Linux users. It expected the .env file to be created always, but the instructions to create them were not working on Windows. This fixes the problem by turning the error into warning, and directing the users to the right instructions per operating system. Also the recent ``DUMB_INIT_SESS_ID`` was added for worker to allow to handle signals properly also in our quick-start docker-compose. (cherry picked from commit bd77689)
I think #17269 was missed from releases |
Why do you think it should be there @eladkal :)? We do not cherry-pick all bugfixes to "patchlevel" releases, it's always risky - I think 2.1.4 is mostly about stabilising 2.1.3 (which had a number of stability issues) + some doc-only changes that are mostly clarifications, guiding our users better, improvements in communication (generally 0 or low-risk changes). The #17269 is also some new behaviour rather than bugfix IMHO so it should go to 2.2 |
mostly as this seem to be a bug fix for #18102 but I'm OK with it waiting for 2.2 as it should follow shortly |
This PR separate installing Airflow from sources section and also fixes links for binary source, it had `-bin` suffix which we don't use anymore. And I have added section on verifying integrity. And add more details with examples (cherry picked from commit f9969c1)
Woohoo! |
Just need an approval, so that I can push to
v2-1-stable
which is a protected branch^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.