Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE - JUST FOR APPROVAL] Prepare 2.0.3 release #15705

Closed
wants to merge 46 commits into from

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented May 6, 2021


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Daniel Standish and others added 24 commits May 6, 2021 16:10
After PROD images were added, some of the flags had two meanings

These behaved differently in PROD image and CI image and were the
source of confusion especially when start-airflow command was used.

For PROD image, the image can be customized during image building,
and packages could be installed from .whl or .sdist packages
available in `docker-context-files`. This
is used at CI and dockerhub building time to produce image built
packages that were prepared using local sources.

The CI image is always built from local sources but airflow can
be removed and re-installed at runtime from pypi.
Both airflow and provider packages can be installed
from .whl or .sdist packages available in dist folder. This is
used in CI to test current provider packages with older
Airflow released (2.0.0) and to test provider packages locally.

After the change we have two sets of flags/variables:

PROD image (building image):

* install-airflow-version, install-airflow-reference,
  install-from-docker-context-files

CI image (runtime):

* use-airflow-version, use-packages-from-dist

That should avoid confusion and failures of commands such as
`start-airflow` that is used to test provider packages and
airflow itself.

(cherry picked from commit 36ba5b6)
This build is not really needed any more gathering stats
about quarantined builds was not very successful experiment.

(cherry picked from commit 63bec6f)
Since 2.0.2 was released yesterday, our guides and Breeze should point
to that.

(cherry picked from commit b314c71)
* Fixes constraint generation for pypi providers

The constraints generated from PyPI version of providers, missed
core requirements of Airflow, therefore the constraints were not
consistent with setup.py core requirements.

Fixes: #15463
(cherry picked from commit 5da74f6)
There are a number of places where we want the current Airflow version
to appear in the docs, and sphinx has this build in, `|version|`.

But sadly that only works for "inline text", it doesn't work in code
blocks or inline code. This PR also adds two custom plugins that make
this work inspired by
https://github.com/adamtheturtle/sphinx-substitution-extensions (but
entirely re-written as that module Just Didn't Work)

(cherry picked from commit 4c8a32c)
`ou` -> `you`

(cherry picked from commit 150f225)
)

The Dockerfile is more "packed" and certain ARG/ENVs are in separate
parts of it but we save minutes in certain scenarios when the images
are built (especially when they are built in parallell, the
difference might be significant)

This change also removes some of the old, already unused CASS_DRIVER
ARGS and ENVS. They are not needed any more as cassandra drivers do
not require CPYTHON compilation any more.

(cherry picked from commit 043a88d)
In most cases these are the same -- the one exception is when
(re)opening an issue, in which case the actor is going to be someone
with commit rights to a repo, and we don't want the mere act of
re-opening to cause a PR to run on self-hosted infrastructure as that
would be surprising (and potentially unsafe)

(cherry picked from commit be8d2b1)
* Use Pip 21.* to install airflow officially

The PIP 20.2.4 was so far the only officially supported installation
mechanism for Airflow as there were some problems with conflicting
dependencies (which were ignored by previous versio of PIP).

This change attempts to solve this by removing a [gcp] extra
from `apache-beam` which turns out to be the major source of
the problem - as it contains requirements to the old version of
google client libraries (but apparently only used for tests).

The "apache-beam" provider migh however need the [gcp] extra
for other components so in order to not break the backwards
compatibility, another approach is used.

Instead of adding [gcp] as extra in the apache-beam extra,
the apache.beam provider's [google] extra is extended with
'apache-beam[gcp]' additional requirement so that whenever the
provider is installed, the apache-beam with [gcp] extra is installed
as well.

* Update airflow/providers/apache/beam/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update airflow/providers/apache/beam/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update airflow/providers/google/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update airflow/providers/google/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit e229f35)
The release manager, when reviewing providers to release might
make interactive decisions what to do:

1) mark certain provider as 'doc-only' change
2) decide whethere to generate documentation for the provider

In case the provider change is marked as 'doc-only' the next time
when providers are checked the doc-only change is not seen as
'change' and the provider is automatically skipped.

This saves time when preparing subsequent releases of providers
as all the "doc-only changes" from the previous release do not
have to be re-reviewed (unless there are some new changes).

(cherry picked from commit 40a2476)
Newer versions of hadolint hint about more Docker problems:

* consecutive RUN operation
* invalid labels

This PR fixes all the problems reported in our dockerfiles
by the latest hadolint and refreshes all our images used in CI
and chart so that corrected label names are included (one of
the errors in all our dockerfiles turned out to be camel-case
and - in label keys, which is not valid according to
Docker label key specification.

Fixes: #15544
(cherry picked from commit 6580a2c)
This change improves the process of image preparation in DockerHub
and manual version of it, in case the DockerHub automation does
not work. It introduces the following changes:

* The "nightly-master" builds were failing because they tried
  to prepare packages without the "dev" suffix (such packages
  are skipped now in case package with the same version has
  already been released). The "dev" suffix forces the packages
  to be build.

* VERBOSE_COMMAND variable is removed to get more readable output
  of the script.

* Image verification is now part of the process. The images are
  automatically tested after they are built and the scripts
  will not push the images if the images do not pass the
  verification.

* Documentation is updated for both RC and final image preparation
  (Previous update did not update the RC image preparation)

* Documentation is added to explain how to manually refresh the
  images in DockerHub in case the nightly builds are not running
  for a long time.

(cherry picked from commit 7f6ddda)
The image tagging now is fully automated within the build
dockerhub script including :<VERSION> and :latest tags.

(cherry picked from commit 3d227f2)
When building images for production we are using docker-context-files
where we build packages to install. However if those context files
are not cleaned up, they unnecessary increase size and time needed
to build image and they invalidate the COPY . layer of the image.

This PR checks if docker-context-files folder contains just readme
when Breeze build-image command is run (for cases where
images are not built from docker-context-files). Inversely it
also checks that there are some files in case the image is
built with --install-from-docker-context-files switch.

This PR also ads a --cleanup-docker-context-files switch to
clean-up the folder automatically. The error mesages also help
the user instructing the user what to do.

(cherry picked from commit bf81d2e)
* Better description of UID/GID behaviour in image and quickstart

Following the discussion in
#15579
seems that the AIRFLOW_UID/GID parameters were not clearly
explained in the Docker Quick-start guide and some users could
find it confusing.

This PR attempts to clarify it.

* fixup! Better description of UID/GID behaviour in image and quickstart

(cherry picked from commit 4226f64)
Error because `webpack` is not install because `yarn install --frozen-lockfile` is not run:

```
root@f5fc5cfc9a43:/opt/airflow# cd /opt/airflow/airflow/www/; yarn dev
yarn run v1.22.5
$ NODE_ENV=dev webpack --watch --colors --progress --debug --output-pathinfo --devtool eval-cheap-source-map -
-mode development
/bin/sh: 1: webpack: not found
error Command failed with exit code 127.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
root@f5fc5cfc9a43:/opt/airflow/airflow/www#
```

This commits adds `yarn install --frozen-lockfile` to the command which fixes it.

This was missed in https://github.com/apache/airflow/pull/13313/files

(cherry picked from commit 60a3da6)
…nks` (#15673)

Without this change it is impossible for one of the providers to depend
upon the "dev"/current version of Airflow -- pip instead would try and
go out to PyPI to find the version (which almost certainly wont exist,
as it hasn't been released yet)

(cherry picked from commit 13faa69)
* Rename nteract-scrapbook to scrapbook

* fixup! Rename nteract-scrapbook to scrapbook

* Remove version pin given it's minimal version

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>
(cherry picked from commit 9ba467b)
Deprecated provider aliases (e.g. kubernetes -> cncf.kubernetes) should
install the provider package (e.g.  apache-airflow-provider-cncf-kubernetes)
by default, not the requirements for the provider package. This behavior
was accidentally broken.

(cherry picked from commit fdea622)
It seems that the < 20.0 limit for gunicorn was added at some point
in time without actual reason. We are already using gunicorn in
1.10 line of Airflow, so it should not be a problem to bump the
version of gunicorn, especially that the 19. line is somewhat
deprecated already.

This change came after the discussion n #15570

(cherry picked from commit d7a14a8)
@boring-cyborg boring-cyborg bot added area:dev-tools provider:cncf-kubernetes Kubernetes provider related issues area:providers area:Scheduler including HA (high availability) scheduler provider:Apache labels May 6, 2021
@potiuk potiuk marked this pull request as draft May 6, 2021 19:07
@potiuk
Copy link
Member Author

potiuk commented May 7, 2021

Restored the original values in here and I will create a PR in master too.

@potiuk
Copy link
Member Author

potiuk commented May 7, 2021

Actually - I will simply remove 1.10 support from masterr breeze altogether. Highest time.

potiuk and others added 16 commits May 8, 2021 16:42
We've started to receive deprecation warnings for Node 10 and
this PR attempts to upgrade to recommended Node 14.

Fixes: #15713
(cherry picked from commit 46d6278)
Gets rid of Airflow 1.10 in Breeze and script/configuration.

We were still using occasionally the master version of Breeze to
run 1.10 version of Airflow, but this madness should end now when
we are approaching 2.1 release. All changes in Breeze were so far
ported to 1.10 but this is about the time to finish it.

(cherry picked from commit 57ad4af)
Moved building airflow package to within the container similarly
as we do with provider packages. This has the following advantages:

* common environment used to build airflow
* protection against leaking SECRET_* variables in CI in case third
  party packages are installed
* specify --version-suffixes and renaming the packages according
  to destination (SVN/PyPI) automatically
* no need to have node installed in CI runner
* no need to have node installed in DockerHub
* no need to install PIP/Python3 in DockerHub runner (currently
  Python2 is still default and it fails the build there)
* always deleting egg-info and build before the build
* cleaning up egg-info and build after the build

Also following the way providers are released, the documentation
is updated to change publishing Airflow using previously voted
and renamed packages - the very same packages that were committed
to SVN. This way anyone will be able to manually verify that the
packages in SVN are the same as those published in SVN and there
is no need to rebuild the packages when releasing them.

(cherry picked from commit d627dfa)
* Enforce READ COMMITTED isolation when using mysql

* Fixing up tests, removing indentation

* Fixing test

(cherry picked from commit 231d104)
Failing with undefined variables is a good technique to avoid
typos in Bash, but for MacOS this is problematic as bash
used by default on MacOS fails with undefined variable
when there is an empty array passed - which is often needed,
for example when you pass "${@}" arguments.

This PR disables undefined variable check for MacOS.

(cherry picked from commit 507bca5)
Currently traceback is not included when ``JSONFormatter`` is used.
(`[logging] json_format = True`) . However, the default Handler
includes the Stacktrace. To currently include the trace we need to
add `json_fields = asctime, filename, lineno, levelname, message, exc_text`.

This is a bigger problem when using Elasticsearch Logging with:

```ini
[elasticsearch]
write_stdout = True
json_format = True
json_fields = asctime, filename, lineno, levelname, message, exc_text

[logging]
log_format = [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s - %(exc_text)s
```

Running the following DAG with the above config won't show trace:

```python
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago

with DAG(
    dag_id='example_error',
    schedule_interval=None,
    start_date=days_ago(2),
) as dag:

    def raise_error(**kwargs):
        raise Exception("I am an exception from task logs")

    task_1 = PythonOperator(
        task_id='task_1',
        python_callable=raise_error,
    )
```

Before:

```
[2021-04-17 00:11:00,152] {taskinstance.py:877} INFO - Dependencies all met for <TaskInstance: example_python_operator.print_the_context 2021-04-17T00:10:57.110189+00:00 [queued]>
...
...
[2021-04-17 00:11:00,298] {taskinstance.py:1482} ERROR - Task failed with exception
[2021-04-17 00:11:00,300] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=example_python_operator, task_id=print_the_context, execution_date=20210417T001057, start_date=20210417T001100, end_date=20210417T001100
[2021-04-17 00:11:00,325] {local_task_job.py:146} INFO - Task exited with return code 1
```

After:

```
[2021-04-17 00:11:00,152] {taskinstance.py:877} INFO - Dependencies all met for <TaskInstance: example_python_operator.print_the_context 2021-04-17T00:10:57.110189+00:00 [queued]>
...
...
[2021-04-17 00:11:00,298] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 117, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 128, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/eg-2.py", line 25, in print_context
    raise Exception("I am an exception from task logs")
Exception: I am an exception from task logs
[2021-04-17 00:11:00,300] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=example_python_operator, task_id=print_the_context, execution_date=20210417T001057, start_date=20210417T001100, end_date=20210417T001100
[2021-04-17 00:11:00,325] {local_task_job.py:146} INFO - Task exited with return code 1
```

(cherry picked from commit 99ec208)
Closes: #15374
This pull request follows #14776.

Clearing a subdag with Downstream+Recursive does not automatically set the state of the parent dag so that the downstream parent tasks can execute.

(cherry picked from commit a4211e2)
Ensure that we use ti.queued_by_job_id when searching for pods. The queued_by_job_id is used by
adopt_launched_task when updating the labels. Without this, after restarting the scheduler
a third time, the scheduler does not find the pods as it is still searching for the id of
the original scheduler (ti.external_executor_id)

Co-Authored-By: samwedge <19414047+samwedge@users.noreply.github.com>
Co-Authored-By: philip-hope <64643984+philip-hope@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
(cherry picked from commit 344e829)
Currently, on_failure_callback is only called when a task finishes
executing not while executing. When a pod is deleted, a SIGTERM is
sent to the task and the task is stopped immediately. The task is
still running when it was killed and therefore on_failure_callback
is not called.

This PR makes sure that when a pod is marked for deletion and the
task is killed, if the task has on_failure_callback, the callback
is called.

Closes: #14422

(cherry picked from commit def1e7c)
closes: #13700

This should address the root cause by removing any task_ids that are not in the subset DAG from the `used_group_ids`

(cherry picked from commit 1e425fe)
…rnetesPodOperator`` (#15443)

When a pod name is more than `MAX_LABEL_LEN` (63 characters), it is trimmed to 63 chars

https://github.com/apache/airflow/blob/8711f90ab820ed420ef317b931e933a2062c891f/airflow/kubernetes/pod_generator.py#L470-L472

and we add a safe UUID to the `pod_id` joined by a dot `.`. However the regex
for Kubernetes name does not allow `-` followed by a `.`.

Valid Regex:

```
^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
```

This commit strips any hypens at the end of the trimmed `pod_id`.

(cherry picked from commit 130f9e3)
Missed a case in (#15443) where `.` can be followed by another `.`.

(cherry picked from commit 1e66ce8)
Fixes the incorrect number of queued and running slots, and therefore, open slots, when there exist task instances that occupy > 1 pool_slots. This was causing the scheduler to over-commit to a given pool, and a subsequent state where no further tasks can be scheduled because slots cannot be freed.

closes: #15399
(cherry picked from commit d7c27b8)
If Airflow is configured with update_fab_perms config setting to False,
then the Op, User and Viewer roles are created _before_ the permissions
objects are written to the database, meaning that these roles did not
correctly get assigned all the permissions we asked for (the missing
permissions are just silently not created.)

Because of the "migrate to resource permission" migration this problem
is not "disasterous" as all most of the Permissions et al. we use are
created by a migration.

This changes it so that the permissions are always created/synced before
we look at the roles.

(Re-running sync-perm wouldn't fix this, as although the second time
around the Permissions will exist in the DB, we see that Op role already
has permissions and don't make any changes, assuming that the site
operators made such changes.)

(cherry picked from commit 1cd62b9)
@potiuk
Copy link
Member Author

potiuk commented May 9, 2021

Hey @ashb @kaxil, I also reviewed the 2.0.3 milestone, and noticed quite a few issues that were already marked with 2.0.3. I reviewed them all and applied those that were cleanly cherry-pickable, were simple and had good test coverage.

I removed the 2.0.3 milestone from some of the issues that were not easy to cherry pick. Those non-cherry-pickable are mostly UI-related, there are some changes in master branch that make them more difficult to cherry-pick so I left them for 2.1).

@potiuk
Copy link
Member Author

potiuk commented May 9, 2021

BTW. I run './airflow-github compare 2.0.3 and those are the results:

The two "unmerged" ones are actually closed and solved by other issues (but I updated them in the changelog)

Screenshot from 2021-05-09 19-50-02

@potiuk
Copy link
Member Author

potiuk commented May 14, 2021

Closing this one as we decided to release 2.1 instead.

@potiuk potiuk closed this May 14, 2021
@potiuk potiuk deleted the v2-0-test branch November 17, 2023 16:30
@potiuk potiuk restored the v2-0-test branch November 17, 2023 16:31
@potiuk potiuk deleted the v2-0-test branch November 21, 2023 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.