Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes pip download when installing from local packages #13422

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jan 1, 2021

This PR improves building production image from local packages,
in preparation for moving provider requirements out of setup.cfg.

Previously pip download step was executed in the CI scripts
in order to download all the packages that were needed. However
this had two problems:

  1. PIP download was executed outside of Dockerfile in CI scripts
    which means that any change to requirements there could not
    be executed in 'workflow_run' event - because main branch version
    of CI scripts is used there. We want to add extra requirements
    when installing airflow so in order to be able to change
    it, those requirements should be added in Dockerfile.
    This will be done in the follow-up Removes provider-imposed requirements from setup.cfg #13409 PR.

  2. Packages downloaded with PIP download have a "file" version
    rather than regular == version when you run pip freeze/check.
    This looks weird and while you can figure out the version
    from file name, when you pip install them, they look
    much more normal. The airflow package and provider package
    will still get the "file" form but this is ok because we are
    building those packages from sources and they are not yet
    available in PyPI.

Example:

adal==1.2.5
aiohttp==3.7.3
alembic==1.4.3
amqp==2.6.1
apache-airflow @ file:///docker-context-files/apache_airflow-2.1.0.dev0-py3-none-any.whl
apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-1.0.0-py3-none-any.whl
apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-1.0.0-py3-none-any.whl
...

With this PR, we do not pip download all packages, but instead
we prepare airflow + providers packages as .whl files and
install them from there (all the dependencies are installed
from PyPI)


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk
Copy link
Member Author

potiuk commented Jan 1, 2021

I will need to merge this one before making #13409 works

@github-actions
Copy link

github-actions bot commented Jan 1, 2021

The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.

@potiuk potiuk force-pushed the remove-pip-download-when-building-prod-image branch 3 times, most recently from b5b516b to c57e24b Compare January 1, 2021 08:08
@github-actions
Copy link

github-actions bot commented Jan 1, 2021

The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.

@potiuk potiuk force-pushed the remove-pip-download-when-building-prod-image branch from c57e24b to a1ba2d2 Compare January 1, 2021 12:31
@github-actions
Copy link

github-actions bot commented Jan 1, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk force-pushed the remove-pip-download-when-building-prod-image branch from a1ba2d2 to 7144b35 Compare January 1, 2021 13:17
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/mysql,}; \
fi; \
if [[ ${INSTALL_FROM_PYPI} == "true" ]]; then \
if [[ "${UPGRADE_TO_NEWER_DEPENDENCIES}" != "false" ]]; then \
pip install --user "${AIRFLOW_INSTALLATION_METHOD}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
--upgrade --upgrade-strategy eager; \
pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we resore PIP to the version specified (eager upgrade will upgrade it)

else \
pip install --user "${AIRFLOW_INSTALLATION_METHOD}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
pip install --upgrade --upgrade-strategy only-if-needed \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now - we add "only-if needed " strategy here in case someone changes setup.py requirements and the pre-installed versions no longer fit in those requirements.

fi; \
fi; \
if [[ ${INSTALL_FROM_DOCKER_CONTEXT_FILES} == "true" ]]; then \
reinstalling_apache_airflow_packages=$(ls /docker-context-files/apache?airflow*.{whl,tar.gz} 2>/dev/null || true); \
Copy link
Member Author

@potiuk potiuk Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting one. Instead of pip download as we did before, we want to install all dependencies here for airflow packages and follow the constraints we already have (unless we upgrade to newer dependencies -then we do not use constraints).

This is the case where we haave only "airflow" + "airflow packages" build from sources to install (which happens on CI).

fi; \
fi ; \
# All the others we want to reinstall as-is, without dependencies \
reinstalling_other_packages=$(ls /docker-context-files/*.{whl,tar.gz} 2>/dev/null | \
Copy link
Member Author

@potiuk potiuk Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for all other packages, we want to just install whatever we have there without dependencies. If the packages are result of pip download, this serves the purpose of 'offline installation" mechanism where you download all packages and store them in artifact registry and 'vet' and use them for installation.

@@ -325,7 +325,8 @@ RUN if [[ ${INSTALL_FROM_PYPI} == "true" ]]; then \
pip install -e ".[${AIRFLOW_EXTRAS}]" --upgrade --upgrade-strategy eager; \
pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
else \
pip install -e ".[${AIRFLOW_EXTRAS}]" --upgrade --upgrade-strategy only-if-needed; \
pip install -e ".[${AIRFLOW_EXTRAS}]" --upgrade --upgrade-strategy only-if-needed\
--constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to stick to the constraints here as well (unless setup.py requirements change)

RUN if [[ ${INSTALL_FROM_DOCKER_CONTEXT_FILES} != "true" ]]; then \
if ls /docker-context-files/*.{whl,tar.gz} 1> /dev/null 2>&1; then \
pip install --no-deps /docker-context-files/*.{whl,tar.gz}; \
# hadolint ignore=SC2086, SC2010
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 'docker-file-context' installation for CI image. It might be useful only in case we want to test CI image built from packages, good to keep that option.

# Cleanup dist and docker-context-files folders
mkdir -pv "${AIRFLOW_SOURCES}/dist"
mkdir -pv "${AIRFLOW_SOURCES}/docker-context-files"
rm -f "${AIRFLOW_SOURCES}/dist/"*.{whl,tar.gz}
rm -f "${AIRFLOW_SOURCES}/docker-context-files/"*.{whl,tar.gz}

runs::run_pip_download
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to run pip download now. This is good because this code is run from master, so previously we could not change the requirements for that pip download. In the #13409 we are going to add per-image requirements and since they will be in the images, not scripts, they are safe to run in workflow_run

@potiuk
Copy link
Member Author

potiuk commented Jan 1, 2021

I think this one should be good to go! I need to merge it before #13409

@potiuk potiuk force-pushed the remove-pip-download-when-building-prod-image branch 3 times, most recently from 26df514 to ee773de Compare January 1, 2021 19:38
@potiuk
Copy link
Member Author

potiuk commented Jan 1, 2021

It's green! seems that the "reverse lookup disabling" solves the problem and we are all green!

AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/mysql,}; \
fi; \
if [[ ${INSTALL_FROM_PYPI} == "true" ]]; then \
if [[ "${UPGRADE_TO_NEWER_DEPENDENCIES}" != "false" ]]; then \
pip install --user "${AIRFLOW_INSTALLATION_METHOD}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
--upgrade --upgrade-strategy eager; \
pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this limitation to the previous command so that this update never happens?

            pip install --user "${AIRFLOW_INSTALLATION_METHOD}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
                            --constraint <<(echo  "pip==${AIRFLOW_PIP_VERSION}") \
                            --upgrade --upgrade-strategy eager; \

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What update do you suggest?

Copy link
Member Author

@potiuk potiuk Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to leave it as is unless you have better proposal

Copy link
Member

@mik-laj mik-laj Jan 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first comment, I added an example. To show my idea better, I prepared the entire commit. Does this solve this problem?

From cc68b6ba19a4bc11e5aed3ec01f3e4f86f76f745 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kamil=20Bregu=C5=82a?= <kamil.bregula@polidea.com>
Date: Sat, 2 Jan 2021 01:16:03 +0100
Subject: Add constraints to prevent upgrade of pip

---
 Dockerfile | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 81737a15b..b72d859dc 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -255,13 +255,13 @@ RUN if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then \
     if [[ ${INSTALL_FROM_PYPI} == "true" ]]; then \
         if [[ "${UPGRADE_TO_NEWER_DEPENDENCIES}" != "false" ]]; then \
             pip install --user "${AIRFLOW_INSTALLATION_METHOD}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
+                --constraint <(echo "pip==${AIRFLOW_PIP_VERSION}") \
                 --upgrade --upgrade-strategy eager; \
-            pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
         else \
             pip install --upgrade --upgrade-strategy only-if-needed \
                 --user "${AIRFLOW_INSTALLATION_METHOD}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
+                --constraint <(echo "pip==${AIRFLOW_PIP_VERSION}") \
                 --constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
-            pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
         fi; \
     fi; \
     if [[ ${INSTALL_FROM_DOCKER_CONTEXT_FILES} == "true" ]]; then \
@@ -270,12 +270,13 @@ RUN if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then \
         if [[ "${reinstalling_apache_airflow_packages}" != "" ]]; then \
             if [[ "${UPGRADE_TO_NEWER_DEPENDENCIES}" != "false" ]]; then \
                 pip install --force-reinstall --upgrade --upgrade-strategy eager \
+                    --constraint <(echo "pip==${AIRFLOW_PIP_VERSION}") \
                     --user ${reinstalling_apache_airflow_packages}; \
-                pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
             else \
                 pip install --force-reinstall --upgrade --upgrade-strategy only-if-needed \
-                    --user ${reinstalling_apache_airflow_packages} --constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
-                pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
+                    --user ${reinstalling_apache_airflow_packages} \
+                    --constraint <(echo "pip==${AIRFLOW_PIP_VERSION}") \
+                    --constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
             fi; \
         fi ; \
         # All the others we want to reinstall as-is, without dependencies \
@@ -287,12 +288,14 @@ RUN if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then \
     fi; \
     if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
         if [[ "${UPGRADE_TO_NEWER_DEPENDENCIES}" != "false" ]]; then \
-            pip install --user ${ADDITIONAL_PYTHON_DEPS} --upgrade --upgrade-strategy eager; \
-            pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
+            pip install --user ${ADDITIONAL_PYTHON_DEPS} \
+                --constraint <(echo "pip==${AIRFLOW_PIP_VERSION}") \
+                --upgrade --upgrade-strategy eager; \
         else \
-            pip install --user ${ADDITIONAL_PYTHON_DEPS} --upgrade --upgrade-strategy only-if-needed \
+            pip install --user ${ADDITIONAL_PYTHON_DEPS} \
+                --upgrade --upgrade-strategy only-if-needed \
+                --constraint <(echo "pip==${AIRFLOW_PIP_VERSION}") \
                 --constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
-            pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"; \
         fi; \
     fi; \
     find /root/.local/ -name '*.pyc' -print0 | xargs -0 rm -r || true ; \
-- 
2.28.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might not solve it. I have no 100% certainly but --constraint is not really binding (at least for pip 20.2.4). When you are upgrading with eager strategy, PIP often installs dependencies which are not within constraints. Constraint is merely a "hint" which does not have to be followed (and especially with 'eager' upgrade strategy.

I think I prefer to explicitly force pip version by installing it explicitly after.

This PR improves building production image from local packages,
in preparation for moving provider requirements out of setup.cfg.

Previously `pip download` step was executed in the CI scripts
in order to download all the packages that were needed. However
this had two problems:

1) PIP download was executed outside of Dockerfile in CI scripts
   which means that any change to requirements there could not
   be executed in 'workflow_run' event - because main branch version
   of CI scripts is used there. We want to add extra requirements
   when installing airflow so in order to be able to change
   it, those requirements should be added in Dockerfile.
   This will be done in the follow-up apache#13409 PR.

2) Packages downloaded with PIP download have a "file" version
   rather than regular == version when you run pip freeze/check.
   This looks weird and while you can figure out the version
   from file name, when you `pip install` them, they look
   much more normal. The airflow package and provider package
   will still get the "file" form but this is ok because we are
   building those packages from sources and they are not yet
   available in PyPI.

Example:

  adal==1.2.5
  aiohttp==3.7.3
  alembic==1.4.3
  amqp==2.6.1
  apache-airflow @ file:///docker-context-files/apache_airflow-2.1.0.dev0-py3-none-any.whl
  apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-1.0.0-py3-none-any.whl
  apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-1.0.0-py3-none-any.whl
  ...

With this PR, we do not `pip download` all packages, but instead
we prepare airflow + providers packages as .whl files and
install them from there (all the dependencies are installed
from PyPI)
@potiuk potiuk force-pushed the remove-pip-download-when-building-prod-image branch from ee773de to 583b049 Compare January 2, 2021 00:52
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 2, 2021
@github-actions
Copy link

github-actions bot commented Jan 2, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit e436883 into apache:master Jan 2, 2021
@potiuk potiuk deleted the remove-pip-download-when-building-prod-image branch January 2, 2021 10:16
@potiuk potiuk added this to the Airflow 2.0.1 milestone Jan 2, 2021
potiuk added a commit to PolideaInternal/airflow that referenced this pull request Jan 2, 2021
In the latest change apache#13422 change in the way product images are
prepared removed extras from installed airflow - thus caused
failing production image verification check.

This change restores extras when airflow is installed from packages
potiuk added a commit that referenced this pull request Jan 2, 2021
In the latest change #13422 change in the way product images are
prepared removed extras from installed airflow - thus caused
failing production image verification check.

This change restores extras when airflow is installed from packages
kaxil pushed a commit that referenced this pull request Jan 21, 2021
This PR improves building production image from local packages,
in preparation for moving provider requirements out of setup.cfg.

Previously `pip download` step was executed in the CI scripts
in order to download all the packages that were needed. However
this had two problems:

1) PIP download was executed outside of Dockerfile in CI scripts
   which means that any change to requirements there could not
   be executed in 'workflow_run' event - because main branch version
   of CI scripts is used there. We want to add extra requirements
   when installing airflow so in order to be able to change
   it, those requirements should be added in Dockerfile.
   This will be done in the follow-up #13409 PR.

2) Packages downloaded with PIP download have a "file" version
   rather than regular == version when you run pip freeze/check.
   This looks weird and while you can figure out the version
   from file name, when you `pip install` them, they look
   much more normal. The airflow package and provider package
   will still get the "file" form but this is ok because we are
   building those packages from sources and they are not yet
   available in PyPI.

Example:

  adal==1.2.5
  aiohttp==3.7.3
  alembic==1.4.3
  amqp==2.6.1
  apache-airflow @ file:///docker-context-files/apache_airflow-2.1.0.dev0-py3-none-any.whl
  apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-1.0.0-py3-none-any.whl
  apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-1.0.0-py3-none-any.whl
  ...

With this PR, we do not `pip download` all packages, but instead
we prepare airflow + providers packages as .whl files and
install them from there (all the dependencies are installed
from PyPI)

(cherry picked from commit e436883)
kaxil pushed a commit that referenced this pull request Jan 21, 2021
In the latest change #13422 change in the way product images are
prepared removed extras from installed airflow - thus caused
failing production image verification check.

This change restores extras when airflow is installed from packages

(cherry picked from commit 3a73110)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants