Skip to content

Commit

Permalink
Optimize cachine installed packages in CI build (#37315)
Browse files Browse the repository at this point in the history
Some of the recent changes in handling conflicting dependencies
broke optimization of installing dependencies from branch tip.

The optimisation worked in the way that it installed packages first
from branch tip, to make them pre-installed (and cached in docker
layer) so that final installatin step with pyproject.toml takes
very little time, even if it is changed.

The problem was that in case branch tip and constraints conflicted,
the installation failed and effectively no packages were installed in
the "branch tip" layer, effectively removing the cache.

This change fixes it - when we install from branch tip now we are not
using constraints, which means that they will never conflict, and
this also means that cache will never be empty. It can contain other
versions of some of the packages, but vast majority of the packages
shoudo be the same as in constraints, so the following installation
step should reuse vast majority of already installed packages.
  • Loading branch information
potiuk authored Feb 11, 2024
1 parent 9a10085 commit 90a650d
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 12 deletions.
10 changes: 6 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -447,14 +447,16 @@ function install_airflow_dependencies_from_branch_tip() {
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
fi
# Install latest set of dependencies using constraints. In case constraints were upgraded and there
# are conflicts, this might fail, but it should be fixed in the following installation steps
# Install latest set of dependencies - without constraints. This is to download a "base" set of
# dependencies that we can cache and reuse when installing airflow using constraints and latest
# pyproject.toml in the next step (when we install regular airflow).
set -x
pip install --root-user-action ignore \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]" \
--constraint "${AIRFLOW_CONSTRAINTS_LOCATION}" || true
"https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]"
common::install_pip_version
# Uninstall airflow to keep only the dependencies. In the future when planned https://github.com/pypa/pip/issues/11440
# is implemented in pip we might be able to use this flag and skip the remove step.
pip freeze | grep apache-airflow-providers | xargs pip uninstall --yes 2>/dev/null || true
set +x
echo
Expand Down
10 changes: 6 additions & 4 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -407,14 +407,16 @@ function install_airflow_dependencies_from_branch_tip() {
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
fi
# Install latest set of dependencies using constraints. In case constraints were upgraded and there
# are conflicts, this might fail, but it should be fixed in the following installation steps
# Install latest set of dependencies - without constraints. This is to download a "base" set of
# dependencies that we can cache and reuse when installing airflow using constraints and latest
# pyproject.toml in the next step (when we install regular airflow).
set -x
pip install --root-user-action ignore \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]" \
--constraint "${AIRFLOW_CONSTRAINTS_LOCATION}" || true
"https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]"
common::install_pip_version
# Uninstall airflow to keep only the dependencies. In the future when planned https://github.com/pypa/pip/issues/11440
# is implemented in pip we might be able to use this flag and skip the remove step.
pip freeze | grep apache-airflow-providers | xargs pip uninstall --yes 2>/dev/null || true
set +x
echo
Expand Down
10 changes: 6 additions & 4 deletions scripts/docker/install_airflow_dependencies_from_branch_tip.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,16 @@ function install_airflow_dependencies_from_branch_tip() {
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
fi
# Install latest set of dependencies using constraints. In case constraints were upgraded and there
# are conflicts, this might fail, but it should be fixed in the following installation steps
# Install latest set of dependencies - without constraints. This is to download a "base" set of
# dependencies that we can cache and reuse when installing airflow using constraints and latest
# pyproject.toml in the next step (when we install regular airflow).
set -x
pip install --root-user-action ignore \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]" \
--constraint "${AIRFLOW_CONSTRAINTS_LOCATION}" || true
"https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]"
common::install_pip_version
# Uninstall airflow to keep only the dependencies. In the future when planned https://github.com/pypa/pip/issues/11440
# is implemented in pip we might be able to use this flag and skip the remove step.
pip freeze | grep apache-airflow-providers | xargs pip uninstall --yes 2>/dev/null || true
set +x
echo
Expand Down

0 comments on commit 90a650d

Please sign in to comment.