-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE - JUST FOR APPROVAL] Prepare 2.0.3 release #15705
Conversation
After PROD images were added, some of the flags had two meanings These behaved differently in PROD image and CI image and were the source of confusion especially when start-airflow command was used. For PROD image, the image can be customized during image building, and packages could be installed from .whl or .sdist packages available in `docker-context-files`. This is used at CI and dockerhub building time to produce image built packages that were prepared using local sources. The CI image is always built from local sources but airflow can be removed and re-installed at runtime from pypi. Both airflow and provider packages can be installed from .whl or .sdist packages available in dist folder. This is used in CI to test current provider packages with older Airflow released (2.0.0) and to test provider packages locally. After the change we have two sets of flags/variables: PROD image (building image): * install-airflow-version, install-airflow-reference, install-from-docker-context-files CI image (runtime): * use-airflow-version, use-packages-from-dist That should avoid confusion and failures of commands such as `start-airflow` that is used to test provider packages and airflow itself. (cherry picked from commit 36ba5b6)
This build is not really needed any more gathering stats about quarantined builds was not very successful experiment. (cherry picked from commit 63bec6f)
Since 2.0.2 was released yesterday, our guides and Breeze should point to that. (cherry picked from commit b314c71)
There are a number of places where we want the current Airflow version to appear in the docs, and sphinx has this build in, `|version|`. But sadly that only works for "inline text", it doesn't work in code blocks or inline code. This PR also adds two custom plugins that make this work inspired by https://github.com/adamtheturtle/sphinx-substitution-extensions (but entirely re-written as that module Just Didn't Work) (cherry picked from commit 4c8a32c)
`ou` -> `you` (cherry picked from commit 150f225)
) The Dockerfile is more "packed" and certain ARG/ENVs are in separate parts of it but we save minutes in certain scenarios when the images are built (especially when they are built in parallell, the difference might be significant) This change also removes some of the old, already unused CASS_DRIVER ARGS and ENVS. They are not needed any more as cassandra drivers do not require CPYTHON compilation any more. (cherry picked from commit 043a88d)
In most cases these are the same -- the one exception is when (re)opening an issue, in which case the actor is going to be someone with commit rights to a repo, and we don't want the mere act of re-opening to cause a PR to run on self-hosted infrastructure as that would be surprising (and potentially unsafe) (cherry picked from commit be8d2b1)
* Use Pip 21.* to install airflow officially The PIP 20.2.4 was so far the only officially supported installation mechanism for Airflow as there were some problems with conflicting dependencies (which were ignored by previous versio of PIP). This change attempts to solve this by removing a [gcp] extra from `apache-beam` which turns out to be the major source of the problem - as it contains requirements to the old version of google client libraries (but apparently only used for tests). The "apache-beam" provider migh however need the [gcp] extra for other components so in order to not break the backwards compatibility, another approach is used. Instead of adding [gcp] as extra in the apache-beam extra, the apache.beam provider's [google] extra is extended with 'apache-beam[gcp]' additional requirement so that whenever the provider is installed, the apache-beam with [gcp] extra is installed as well. * Update airflow/providers/apache/beam/CHANGELOG.rst Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Update airflow/providers/apache/beam/CHANGELOG.rst Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Update airflow/providers/google/CHANGELOG.rst Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Update airflow/providers/google/CHANGELOG.rst Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit e229f35)
The release manager, when reviewing providers to release might make interactive decisions what to do: 1) mark certain provider as 'doc-only' change 2) decide whethere to generate documentation for the provider In case the provider change is marked as 'doc-only' the next time when providers are checked the doc-only change is not seen as 'change' and the provider is automatically skipped. This saves time when preparing subsequent releases of providers as all the "doc-only changes" from the previous release do not have to be re-reviewed (unless there are some new changes). (cherry picked from commit 40a2476)
Newer versions of hadolint hint about more Docker problems: * consecutive RUN operation * invalid labels This PR fixes all the problems reported in our dockerfiles by the latest hadolint and refreshes all our images used in CI and chart so that corrected label names are included (one of the errors in all our dockerfiles turned out to be camel-case and - in label keys, which is not valid according to Docker label key specification. Fixes: #15544 (cherry picked from commit 6580a2c)
This change improves the process of image preparation in DockerHub and manual version of it, in case the DockerHub automation does not work. It introduces the following changes: * The "nightly-master" builds were failing because they tried to prepare packages without the "dev" suffix (such packages are skipped now in case package with the same version has already been released). The "dev" suffix forces the packages to be build. * VERBOSE_COMMAND variable is removed to get more readable output of the script. * Image verification is now part of the process. The images are automatically tested after they are built and the scripts will not push the images if the images do not pass the verification. * Documentation is updated for both RC and final image preparation (Previous update did not update the RC image preparation) * Documentation is added to explain how to manually refresh the images in DockerHub in case the nightly builds are not running for a long time. (cherry picked from commit 7f6ddda)
The image tagging now is fully automated within the build dockerhub script including :<VERSION> and :latest tags. (cherry picked from commit 3d227f2)
When building images for production we are using docker-context-files where we build packages to install. However if those context files are not cleaned up, they unnecessary increase size and time needed to build image and they invalidate the COPY . layer of the image. This PR checks if docker-context-files folder contains just readme when Breeze build-image command is run (for cases where images are not built from docker-context-files). Inversely it also checks that there are some files in case the image is built with --install-from-docker-context-files switch. This PR also ads a --cleanup-docker-context-files switch to clean-up the folder automatically. The error mesages also help the user instructing the user what to do. (cherry picked from commit bf81d2e)
* Better description of UID/GID behaviour in image and quickstart Following the discussion in #15579 seems that the AIRFLOW_UID/GID parameters were not clearly explained in the Docker Quick-start guide and some users could find it confusing. This PR attempts to clarify it. * fixup! Better description of UID/GID behaviour in image and quickstart (cherry picked from commit 4226f64)
(cherry picked from commit 4ff05fa)
Error because `webpack` is not install because `yarn install --frozen-lockfile` is not run: ``` root@f5fc5cfc9a43:/opt/airflow# cd /opt/airflow/airflow/www/; yarn dev yarn run v1.22.5 $ NODE_ENV=dev webpack --watch --colors --progress --debug --output-pathinfo --devtool eval-cheap-source-map - -mode development /bin/sh: 1: webpack: not found error Command failed with exit code 127. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. root@f5fc5cfc9a43:/opt/airflow/airflow/www# ``` This commits adds `yarn install --frozen-lockfile` to the command which fixes it. This was missed in https://github.com/apache/airflow/pull/13313/files (cherry picked from commit 60a3da6)
* Rename nteract-scrapbook to scrapbook * fixup! Rename nteract-scrapbook to scrapbook * Remove version pin given it's minimal version Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> (cherry picked from commit 9ba467b)
Deprecated provider aliases (e.g. kubernetes -> cncf.kubernetes) should install the provider package (e.g. apache-airflow-provider-cncf-kubernetes) by default, not the requirements for the provider package. This behavior was accidentally broken. (cherry picked from commit fdea622)
(cherry picked from commit e4deb0f)
It seems that the < 20.0 limit for gunicorn was added at some point in time without actual reason. We are already using gunicorn in 1.10 line of Airflow, so it should not be a problem to bump the version of gunicorn, especially that the 19. line is somewhat deprecated already. This change came after the discussion n #15570 (cherry picked from commit d7a14a8)
Restored the original values in here and I will create a PR in master too. |
Actually - I will simply remove 1.10 support from masterr breeze altogether. Highest time. |
(cherry picked from commit e81ec7c)
Gets rid of Airflow 1.10 in Breeze and script/configuration. We were still using occasionally the master version of Breeze to run 1.10 version of Airflow, but this madness should end now when we are approaching 2.1 release. All changes in Breeze were so far ported to 1.10 but this is about the time to finish it. (cherry picked from commit 57ad4af)
Moved building airflow package to within the container similarly as we do with provider packages. This has the following advantages: * common environment used to build airflow * protection against leaking SECRET_* variables in CI in case third party packages are installed * specify --version-suffixes and renaming the packages according to destination (SVN/PyPI) automatically * no need to have node installed in CI runner * no need to have node installed in DockerHub * no need to install PIP/Python3 in DockerHub runner (currently Python2 is still default and it fails the build there) * always deleting egg-info and build before the build * cleaning up egg-info and build after the build Also following the way providers are released, the documentation is updated to change publishing Airflow using previously voted and renamed packages - the very same packages that were committed to SVN. This way anyone will be able to manually verify that the packages in SVN are the same as those published in SVN and there is no need to rebuild the packages when releasing them. (cherry picked from commit d627dfa)
* Enforce READ COMMITTED isolation when using mysql * Fixing up tests, removing indentation * Fixing test (cherry picked from commit 231d104)
Failing with undefined variables is a good technique to avoid typos in Bash, but for MacOS this is problematic as bash used by default on MacOS fails with undefined variable when there is an empty array passed - which is often needed, for example when you pass "${@}" arguments. This PR disables undefined variable check for MacOS. (cherry picked from commit 507bca5)
Currently traceback is not included when ``JSONFormatter`` is used. (`[logging] json_format = True`) . However, the default Handler includes the Stacktrace. To currently include the trace we need to add `json_fields = asctime, filename, lineno, levelname, message, exc_text`. This is a bigger problem when using Elasticsearch Logging with: ```ini [elasticsearch] write_stdout = True json_format = True json_fields = asctime, filename, lineno, levelname, message, exc_text [logging] log_format = [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s - %(exc_text)s ``` Running the following DAG with the above config won't show trace: ```python from airflow import DAG from airflow.operators.python import PythonOperator from airflow.utils.dates import days_ago with DAG( dag_id='example_error', schedule_interval=None, start_date=days_ago(2), ) as dag: def raise_error(**kwargs): raise Exception("I am an exception from task logs") task_1 = PythonOperator( task_id='task_1', python_callable=raise_error, ) ``` Before: ``` [2021-04-17 00:11:00,152] {taskinstance.py:877} INFO - Dependencies all met for <TaskInstance: example_python_operator.print_the_context 2021-04-17T00:10:57.110189+00:00 [queued]> ... ... [2021-04-17 00:11:00,298] {taskinstance.py:1482} ERROR - Task failed with exception [2021-04-17 00:11:00,300] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=example_python_operator, task_id=print_the_context, execution_date=20210417T001057, start_date=20210417T001100, end_date=20210417T001100 [2021-04-17 00:11:00,325] {local_task_job.py:146} INFO - Task exited with return code 1 ``` After: ``` [2021-04-17 00:11:00,152] {taskinstance.py:877} INFO - Dependencies all met for <TaskInstance: example_python_operator.print_the_context 2021-04-17T00:10:57.110189+00:00 [queued]> ... ... [2021-04-17 00:11:00,298] {taskinstance.py:1482} ERROR - Task failed with exception Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 117, in execute return_value = self.execute_callable() File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 128, in execute_callable return self.python_callable(*self.op_args, **self.op_kwargs) File "/usr/local/airflow/dags/eg-2.py", line 25, in print_context raise Exception("I am an exception from task logs") Exception: I am an exception from task logs [2021-04-17 00:11:00,300] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=example_python_operator, task_id=print_the_context, execution_date=20210417T001057, start_date=20210417T001100, end_date=20210417T001100 [2021-04-17 00:11:00,325] {local_task_job.py:146} INFO - Task exited with return code 1 ``` (cherry picked from commit 99ec208)
(cherry picked from commit 32c6362)
Ensure that we use ti.queued_by_job_id when searching for pods. The queued_by_job_id is used by adopt_launched_task when updating the labels. Without this, after restarting the scheduler a third time, the scheduler does not find the pods as it is still searching for the id of the original scheduler (ti.external_executor_id) Co-Authored-By: samwedge <19414047+samwedge@users.noreply.github.com> Co-Authored-By: philip-hope <64643984+philip-hope@users.noreply.github.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> (cherry picked from commit 344e829)
Currently, on_failure_callback is only called when a task finishes executing not while executing. When a pod is deleted, a SIGTERM is sent to the task and the task is stopped immediately. The task is still running when it was killed and therefore on_failure_callback is not called. This PR makes sure that when a pod is marked for deletion and the task is killed, if the task has on_failure_callback, the callback is called. Closes: #14422 (cherry picked from commit def1e7c)
…rnetesPodOperator`` (#15443) When a pod name is more than `MAX_LABEL_LEN` (63 characters), it is trimmed to 63 chars https://github.com/apache/airflow/blob/8711f90ab820ed420ef317b931e933a2062c891f/airflow/kubernetes/pod_generator.py#L470-L472 and we add a safe UUID to the `pod_id` joined by a dot `.`. However the regex for Kubernetes name does not allow `-` followed by a `.`. Valid Regex: ``` ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ ``` This commit strips any hypens at the end of the trimmed `pod_id`. (cherry picked from commit 130f9e3)
Fixes the incorrect number of queued and running slots, and therefore, open slots, when there exist task instances that occupy > 1 pool_slots. This was causing the scheduler to over-commit to a given pool, and a subsequent state where no further tasks can be scheduled because slots cannot be freed. closes: #15399 (cherry picked from commit d7c27b8)
If Airflow is configured with update_fab_perms config setting to False, then the Op, User and Viewer roles are created _before_ the permissions objects are written to the database, meaning that these roles did not correctly get assigned all the permissions we asked for (the missing permissions are just silently not created.) Because of the "migrate to resource permission" migration this problem is not "disasterous" as all most of the Permissions et al. we use are created by a migration. This changes it so that the permissions are always created/synced before we look at the roles. (Re-running sync-perm wouldn't fix this, as although the second time around the Permissions will exist in the DB, we see that Op role already has permissions and don't make any changes, assuming that the site operators made such changes.) (cherry picked from commit 1cd62b9)
Hey @ashb @kaxil, I also reviewed the 2.0.3 milestone, and noticed quite a few issues that were already marked with 2.0.3. I reviewed them all and applied those that were cleanly cherry-pickable, were simple and had good test coverage. I removed the 2.0.3 milestone from some of the issues that were not easy to cherry pick. Those non-cherry-pickable are mostly UI-related, there are some changes in master branch that make them more difficult to cherry-pick so I left them for 2.1). |
Closing this one as we decided to release |
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.