-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-4115] Multi-staging Aiflow Docker image [Step 1/3] #4936
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# NOTE! This docker ignore uses recommended technique | ||
# Where everything is excluded by default and you deliberately | ||
# Add only those directories/files you need. This is very useful | ||
# To make sure that Docker context is always the same on any machine | ||
# So that generated files are not accidentally added to the context | ||
# This allows Docker's `COPY .` to behave in predictable way | ||
|
||
# Ignore everything | ||
** | ||
|
||
# Allow only these directories | ||
!airflow | ||
!dags | ||
!dev | ||
#!/docs - we do not need docs in the Docker! | ||
!licenses | ||
!scripts | ||
!tests | ||
|
||
!.coveragerc | ||
!.rat-excludes | ||
!LICENSE | ||
!MANIFEST.in | ||
!NOTICE | ||
|
||
# Avoid triggering context change on README change (new companies using Airflow) | ||
# So please do not uncomment this line ;) | ||
# !README.md | ||
|
||
# Setup/version configuration | ||
!setup.cfg | ||
!setup.py | ||
|
||
# Entrypoint script | ||
!scripts/docker/entrypoint.sh | ||
|
||
# Now - ignore unnecessary files inside allowed directories | ||
# This goes after the allowed directories | ||
|
||
# Git version is dynamically generated | ||
airflow/git_version | ||
|
||
# Exclude static www files generated bu NPM | ||
airflow/www/static/coverage | ||
airflow/www/static/dist | ||
airflow/www/node_modules | ||
|
||
# Exclude link to docs | ||
airflow/www/static/docs | ||
|
||
# Exclude python generated files | ||
**/__pycache__/ | ||
**/*.py[cod] | ||
**/*$py.class | ||
**/.pytest_cache/ | ||
**/env/ | ||
**/build/ | ||
**/develop-eggs/ | ||
**/dist/ | ||
**/downloads/ | ||
**/eggs/ | ||
**/.eggs/ | ||
**/lib/ | ||
**/lib64/ | ||
**/parts/ | ||
**/sdist/ | ||
**/var/ | ||
**/wheels/ | ||
**/*.egg-info/ | ||
**/.installed.cfg | ||
**/*.egg | ||
|
||
# Exclude temporary vi files | ||
**/*~ | ||
|
||
# Exclude output files | ||
**/*.out | ||
**/hive_scratch_dir/ | ||
|
||
# Exclude auto-generated Finder files on Mac OS | ||
**/.DS_Store | ||
potiuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
**/Thumbs.db |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
ignored: | ||
- DL3006 | ||
- DL3008 | ||
- DL3005 | ||
- DL3013 |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -13,46 +13,185 @@ | |||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||
# See the License for the specific language governing permissions and | ||||||
# limitations under the License. | ||||||
# | ||||||
# WARNING: THIS DOCKERFILE IS NOT INTENDED FOR PRODUCTION USE OR DEPLOYMENT. | ||||||
# | ||||||
# Base image for the whole Docker file | ||||||
ARG APT_DEPS_IMAGE="airflow-apt-deps" | ||||||
ARG PYTHON_BASE_IMAGE="python:3.6-slim" | ||||||
############################################################################################################ | ||||||
# This is base image with APT dependencies needed by Airflow. It is based on a python slim image | ||||||
# Parameters: | ||||||
# PYTHON_BASE_IMAGE - base python image (python:x.y-slim) | ||||||
############################################################################################################ | ||||||
FROM ${PYTHON_BASE_IMAGE} as airflow-apt-deps | ||||||
|
||||||
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] | ||||||
potiuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
# Need to repeat the empty argument here otherwise it will not be set for this stage | ||||||
# But the default value carries from the one set before FROM | ||||||
ARG PYTHON_BASE_IMAGE | ||||||
ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} | ||||||
|
||||||
ARG AIRFLOW_VERSION="2.0.0.dev0" | ||||||
ENV AIRFLOW_VERSION=$AIRFLOW_VERSION | ||||||
|
||||||
# Print versions | ||||||
RUN echo "Base image: ${PYTHON_BASE_IMAGE}" | ||||||
RUN echo "Airflow version: ${AIRFLOW_VERSION}" | ||||||
|
||||||
# Make sure noninteractie debian install is used and language variab1les set | ||||||
ENV DEBIAN_FRONTEND=noninteractive LANGUAGE=C.UTF-8 LANG=C.UTF-8 LC_ALL=C.UTF-8 \ | ||||||
LC_CTYPE=C.UTF-8 LC_MESSAGES=C.UTF-8 | ||||||
|
||||||
# By increasing this number we can do force build of all dependencies | ||||||
ARG DEPENDENCIES_EPOCH_NUMBER="1" | ||||||
# Increase the value below to force renstalling of all dependencies | ||||||
ENV DEPENDENCIES_EPOCH_NUMBER=${DEPENDENCIES_EPOCH_NUMBER} | ||||||
|
||||||
# Install curl and gnupg2 - needed to download nodejs in next step | ||||||
RUN apt-get update \ | ||||||
&& apt-get install -y --no-install-recommends \ | ||||||
curl gnupg2 \ | ||||||
&& apt-get autoremove -yqq --purge \ | ||||||
&& apt-get clean \ | ||||||
&& rm -rf /var/lib/apt/lists/* | ||||||
|
||||||
|
||||||
# Install basic apt dependencies | ||||||
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we are splitting out to multi-stage images it would be nice to node and the asset build stage as it's own step so that in the final image we have
No node, no node modules in final image that way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But that would prevent you from re-running npm ci/npm run prod when you work with mounted sources. It's not really acceptable for local development with the CI image. I think it's good to keep it - at the very least - for the CI image. I can think about some optional step to build it as another stage for "Airflow" image - but I think it's not really good idea. I will think a bit more about it, but as I see it, it would complicate the whole Dockerfile and image a lot with little gain. I'd rather leave that in for now. Especially that we have bright future. I think it's a good candidate for upcoming changes in Docker - especially Buildkit. There is a new, experimental feature with mount type in RUN command. It could be used for that purpose. Secret type seems especially interesting as it prevents the layer generated by a RUN command to ever appear in the final image (it will be present during the build): I already saw people commenting that they would use secret for that very purpose you described. It was foreseen for other purpose (to keep secretes from leaking) but maybe in the final buildkit release there will be a new type "ignored" or smth. :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair, on CI this makes sense. |
||||||
&& apt-get update \ | ||||||
&& apt-get install -y --no-install-recommends \ | ||||||
# Packages to install \ | ||||||
libsasl2-dev freetds-bin build-essential sasl2-bin \ | ||||||
libsasl2-2 libsasl2-dev libsasl2-modules \ | ||||||
default-libmysqlclient-dev apt-utils curl rsync netcat locales \ | ||||||
freetds-dev libkrb5-dev libssl-dev libffi-dev libpq-dev git \ | ||||||
nodejs gosu sudo \ | ||||||
&& apt-get autoremove -yqq --purge \ | ||||||
&& apt-get clean \ | ||||||
&& rm -rf /var/lib/apt/lists/* | ||||||
|
||||||
RUN adduser airflow \ | ||||||
&& echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ | ||||||
&& chmod 0440 /etc/sudoers.d/airflow | ||||||
|
||||||
############################################################################################################ | ||||||
# This is the target image - it installs PIP and NPN dependencies including efficient caching | ||||||
# mechanisms - it might be used to build the bare airflow build or CI build | ||||||
# Parameters: | ||||||
# APT_DEPS_IMAGE - image with APT dependencies. It might either be base deps image with airflow | ||||||
# dependencies or CI deps image that contains also CI-required dependencies | ||||||
############################################################################################################ | ||||||
FROM ${APT_DEPS_IMAGE} as main | ||||||
|
||||||
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] | ||||||
|
||||||
WORKDIR /opt/airflow | ||||||
|
||||||
RUN echo "Airflow version: ${AIRFLOW_VERSION}" | ||||||
|
||||||
ARG APT_DEPS_IMAGE | ||||||
ENV APT_DEPS_IMAGE=${APT_DEPS_IMAGE} | ||||||
|
||||||
ARG AIRFLOW_HOME=/opt/airflow | ||||||
ENV AIRFLOW_HOME=${AIRFLOW_HOME} | ||||||
|
||||||
FROM python:3.6-slim | ||||||
SHELL ["/bin/bash", "-xc"] | ||||||
RUN mkdir -pv ${AIRFLOW_HOME} \ | ||||||
&& chown -R airflow.airflow ${AIRFLOW_HOME} | ||||||
|
||||||
ENV AIRFLOW_HOME=/usr/local/airflow | ||||||
ARG AIRFLOW_DEPS="all" | ||||||
ARG PYTHON_DEPS="" | ||||||
ARG BUILD_DEPS="freetds-dev libkrb5-dev libssl-dev libffi-dev libpq-dev git" | ||||||
ARG APT_DEPS="libsasl2-dev freetds-bin build-essential default-libmysqlclient-dev apt-utils curl rsync netcat locales" | ||||||
# Increase the value here to force reinstalling Apache Airflow pip dependencies | ||||||
ARG PIP_DEPENDENCIES_EPOCH_NUMBER="1" | ||||||
ENV PIP_DEPENDENCIES_EPOCH_NUMBER=${PIP_DEPENDENCIES_EPOCH_NUMBER} | ||||||
|
||||||
ENV PATH="$HOME/.npm-packages/bin:$PATH" | ||||||
# Optimizing installation of Cassandra driver | ||||||
# Speeds up building the image - cassandra driver without CYTHON saves around 10 minutes | ||||||
ARG CASS_DRIVER_NO_CYTHON="1" | ||||||
# Build cassandra driver on multiple CPUs | ||||||
ARG CASS_DRIVER_BUILD_CONCURRENCY="8" | ||||||
|
||||||
RUN set -euxo pipefail \ | ||||||
&& apt update \ | ||||||
&& if [ -n "${APT_DEPS}" ]; then apt install -y $APT_DEPS; fi \ | ||||||
&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ | ||||||
&& apt update \ | ||||||
&& apt install -y nodejs \ | ||||||
&& apt autoremove -yqq --purge \ | ||||||
&& apt clean | ||||||
ENV CASS_DRIVER_BUILD_CONCURRENCY=${CASS_DRIVER_BUILD_CONCURRENCY} | ||||||
ENV CASS_DRIVER_NO_CYTHON=${CASS_DRIVER_NO_CYTHON} | ||||||
potiuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
COPY . /opt/airflow/ | ||||||
# By default PIP install is run without cache to make image smaller | ||||||
ARG PIP_NO_CACHE_DIR="true" | ||||||
ENV PIP_NO_CACHE_DIR=${PIP_NO_CACHE_DIR} | ||||||
RUN echo "Pip no cache dir: ${PIP_NO_CACHE_DIR}" | ||||||
|
||||||
# PIP version used to install dependencies | ||||||
ARG PIP_VERSION="19.0.1" | ||||||
ENV PIP_VERSION=${PIP_VERSION} | ||||||
RUN echo "Pip version: ${PIP_VERSION}" | ||||||
|
||||||
RUN pip install --upgrade pip==${PIP_VERSION} | ||||||
|
||||||
# Airflow sources change frequently but dependency onfiguration won't change that often | ||||||
# We copy setup.py and other files needed to perform setup of dependencies | ||||||
# This way cache here will only be invalidated if any of the | ||||||
# version/setup configuration change but not when airflow sources change | ||||||
COPY --chown=airflow:airflow setup.py /opt/airflow/setup.py | ||||||
COPY --chown=airflow:airflow setup.cfg /opt/airflow/setup.cfg | ||||||
|
||||||
COPY --chown=airflow:airflow airflow/version.py /opt/airflow/airflow/version.py | ||||||
COPY --chown=airflow:airflow airflow/__init__.py /opt/airflow/airflow/__init__.py | ||||||
COPY --chown=airflow:airflow airflow/bin/airflow /opt/airflow/airflow/bin/airflow | ||||||
|
||||||
# Airflow Extras installed | ||||||
ARG AIRFLOW_EXTRAS="all" | ||||||
ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS} | ||||||
RUN echo "Installing with extras: ${AIRFLOW_EXTRAS}." | ||||||
|
||||||
# First install only dependencies but no Apache Airflow itself | ||||||
# This way regular changes in sources of Airflow will not trigger reinstallation of all dependencies | ||||||
# And this Docker layer will be reused between builds. | ||||||
RUN pip install --no-use-pep517 -e ".[${AIRFLOW_EXTRAS}]" | ||||||
|
||||||
COPY --chown=airflow:airflow airflow/www/package.json /opt/airflow/airflow/www/package.json | ||||||
COPY --chown=airflow:airflow airflow/www/package-lock.json /opt/airflow/airflow/www/package-lock.json | ||||||
|
||||||
WORKDIR /opt/airflow/airflow/www | ||||||
RUN npm install \ | ||||||
&& npm run prod | ||||||
|
||||||
# Install necessary NPM dependencies (triggered by changes in package-lock.json) | ||||||
RUN gosu airflow npm ci | ||||||
|
||||||
COPY --chown=airflow:airflow airflow/www/ /opt/airflow/airflow/www/ | ||||||
|
||||||
# Package NPM for production | ||||||
RUN gosu airflow npm run prod | ||||||
|
||||||
WORKDIR /opt/airflow | ||||||
RUN set -euxo pipefail \ | ||||||
&& apt update \ | ||||||
&& if [ -n "${BUILD_DEPS}" ]; then apt install -y $BUILD_DEPS; fi \ | ||||||
&& if [ -n "${PYTHON_DEPS}" ]; then pip install --no-cache-dir ${PYTHON_DEPS}; fi \ | ||||||
&& pip install --no-cache-dir --upgrade pip==19.0.1 \ | ||||||
&& pip install --no-cache-dir --no-use-pep517 -e .[$AIRFLOW_DEPS] \ | ||||||
&& apt purge --auto-remove -yqq $BUILD_DEPS \ | ||||||
&& apt autoremove -yqq --purge \ | ||||||
&& apt clean | ||||||
|
||||||
WORKDIR $AIRFLOW_HOME | ||||||
RUN mkdir -p $AIRFLOW_HOME | ||||||
COPY scripts/docker/entrypoint.sh /entrypoint.sh | ||||||
ENTRYPOINT ["/entrypoint.sh"] | ||||||
|
||||||
# Always apt-get update/upgrade here to get latest dependencies before | ||||||
# we redo pip install | ||||||
RUN apt-get update \ | ||||||
&& apt-get upgrade -y --no-install-recommends \ | ||||||
&& apt-get clean \ | ||||||
&& rm -rf /var/lib/apt/lists/* | ||||||
|
||||||
# Cache for this line will be automatically invalidated if any | ||||||
# of airflow sources change | ||||||
COPY --chown=airflow:airflow . /opt/airflow/ | ||||||
|
||||||
# Always add-get update/upgrade here to get latest dependencies before | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo:
Suggested change
|
||||||
# we redo pip install | ||||||
RUN apt-get update \ | ||||||
&& apt-get upgrade -y --no-install-recommends \ | ||||||
&& apt-get clean \ | ||||||
&& rm -rf /var/lib/apt/lists/* | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having this after copying source files seems out of place/not where I would have put this. Is there a reason this is in this image rather than, say before the first There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main reason was to make sure we always use latest dependencies and update frequently. I think that originated from our original (long time ago :) ) discussion with @Fokko - he was concerned that we will be "stuck" with some older version of apt-get dependencies, rather than latest ones. There must be some trigger to invalidate such apt-get and we can play a bit with it. I also don't find it particularly good to run apt-get upgrade after adding sources. I'd rather do it after copying "setup.py" and related files - this way we would always upgrade to latest apt dependencies together with updates to pip dependencies (which makes perfect sense). This will be less frequent on one hand but also more stable (less risk that build starts failing because some dependency-triggered issue independent of code change). Similarly as in case of PIP dependencies that gives IMHO the best of both worlds - fairly frequent, automated update of dependencies and relative stability - the dependencies will always be "fixed" between a change in setup.py - so no more broken builds "out-of-the-blue". Paired with regularly daily-cron-triggered fresh rebuild from the scratch, I think we might have perfect solution that detects dependency-triggered errors quickly but does not stop contributors from working and running Travis CI tests. I think we are aligned on that @ashb and I am happy to move it there. @Fokko - what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we move this to before the |
||||||
|
||||||
# Additional python deps to install | ||||||
ARG ADDITIONAL_PYTHON_DEPS="" | ||||||
|
||||||
RUN if [ -n "${ADDITIONAL_PYTHON_DEPS}" ]; then \ | ||||||
pip install ${ADDITIONAL_PYTHON_DEPS}; \ | ||||||
fi | ||||||
|
||||||
USER airflow | ||||||
|
||||||
WORKDIR ${AIRFLOW_HOME} | ||||||
|
||||||
COPY --chown=airflow:airflow ./scripts/docker/entrypoint.sh /entrypoint.sh | ||||||
|
||||||
EXPOSE 8080 | ||||||
ENTRYPOINT ["/usr/local/bin/dumb-init", "--", "/entrypoint.sh"] | ||||||
CMD ["--help"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
#!/usr/bin/env bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
set -xeuo pipefail | ||
|
||
MY_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" | ||
|
||
cd ${MY_DIR}/../../ | ||
|
||
|
||
docker run -v $(pwd)/Dockerfile:/root/Dockerfile -v $(pwd)/.hadolint.yaml:/root/.hadolint.yaml \ | ||
-w /root hadolint/hadolint /bin/hadolint Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ignore everything and then add a few things back in? The list of things we dont want in isn't that big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already given a talk on that actually I plan to write blog post :).
It's a practice I now absolutely recommend. I read it in many places as recommendation (for example here: https://youknowfordevs.com/2018/12/07/getting-control-of-your-dockerignore-files.html). But I experienced the effect of it in airflow. Besides longer time to build, this is a problem of cache invalidation when you do 'COPY .' .
The thing is that in this case the whole context has to match when you run rebuild of the image and if you add everything by default then every single file generated or added by accident in your source dir will cause context invalidation.
We already generate a lot of stuff in the sources (node_modules, 'static' and a number of others). When you do local development you often run stuff (such as document generation) locally and they introduce a lot of garbage in your context and invalidate the cache (thus pretty much every single time you restart dockerfile from the COPY . step no matter if the actual sources changed or not.
Moreover - you are not able to foresee what other stuff will be generated in the sources in the future. So no matter how hard you try and exclude everything you do not want, somebody few months from now might add a single generated file (or likely directory) that will cause subsequent frequent cache invalidations. Therefore "exclude everything" and then "add what you need" works much better. There is no chance you forget to add something important then - because your tests will fail.
This is actually the reason I now deliberately build Docker image and run tests from the binary image rather than using sources mounted (as it was before). This way your full docker image is tested. Previously some files could be ignored on build but then mounted from sources - which is quite bad practice - because it does not test the Docker image but some hybrid of Docker image and mounted sources (which are not necessarily the same).
One - hugely important additional thing - making context as minimal as possible is a huge time saver for local builds. What happens under the hood when Docker build runs, the whole context is compressed/packed and sent to the engine rather than used from local sources. This is the reason why Docker build sometimes pauses for many seconds until build starts - if you see it, it usually means you have not done good job in excluding some generated binaries. If you don't ignore such generated files (especially node_modules which are huge/lots of files) then your context might grow very large (and uncontrollably).
Excluding ** is all but guarantee that the context will not grow accidentally and uncontrollably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that by excluding ** we will miss new files is all.
You also have a number of duplicates in this file
https://github.com/apache/airflow/blob/55d4afc354905cc78f2c8f2033caa6a7f8d5c66e/.dockerignore#L30
https://github.com/apache/airflow/blob/55d4afc354905cc78f2c8f2033caa6a7f8d5c66e/.dockerignore#L50-L53
There's no need for both right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we are excluding everything then do we need these
https://github.com/apache/airflow/blob/55d4afc354905cc78f2c8f2033caa6a7f8d5c66e/.dockerignore#L75-L107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct with the duplicates. I reviewed and removed it.
There is little worry that we will forget something - the tests will catch it (and it should rarely be needed - adding a new folder or file at the top level should be really rare event)
And yes - you need those extra specs because they will exclude what should not be part of those included dirs above. The pattern I follow is (and I saw it suggested by several people):
I feel very strong about it - having to debug the context changes (and figuring out what the hell changed to trigger copying of the files again) for many hours. Example: only after a while I realised that there is a generated folder in docs which is _build next to _template (which is not generated, thought the name kind of suggest it). And just generating the docs locally triggered rebuild of Dockerfile (that's why I did not include the whole docs folder :) ). Otherwise I would have to specifically exclude that particular folder (if I knew that I have to do it). I really think adding stuff deliberately to Docker context makes much more sense.
Alternative is to move ALL generated content to specific directory, and exclude that directory - but this is not always possible (node_modules) and from my experience people will not "stick" to it - they will start generating random files in different folders. Of course we are not protected if someone starts generating "xyz" file/folder inside "airflow" folder. But it's rather unlikely someone will do something that crazy. I think generating something in sources is rather bad practice and you do it only if you cannot do it otherwise (that's one of the parts I don't like about python - .pyc files are generated next to actual source files. It reminds me of times when I used RCS versioning system and we had source.c,v next to every file.... BRRR)
But to be honest - It was much bigger problem and I eventually introduced checksums for important files (setup/docker) to check if we need to run docker build at all.
But still I think it makes it much more resilient for future changes that might produce uncontrollable and unpredictable context "growth".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give some more examples where things could go south. When I run unit tests from the IDE, the logs by default are written in the working directory - which is by default airflow's root folder. They are now ignored with .gitignore but when docker build is executed the logs are potentially big and should not be sent as part of the context. It's safer to exclude those by default.