Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quick start for Airflow on Docker #13660

Merged
merged 17 commits into from
Jan 26, 2021

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Jan 14, 2021

Hello,

Adds a docker-compose file that contains a simple sample configuration that allows you to quickly run Airflow in a Docker environment. This is a part of #8605 (comment), but much simpler because .... novice users don't care at all about their environment, but need to get all components up and running quickly. In the next steps, we can work on adding wizards that will allow you to generate your own docker-compose file, but I think it will require some dependent documentation and this is not needed in this section of the documentation yet.

Part of: #8605


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@github-actions
Copy link

The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest master or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jan 14, 2021
@github-actions
Copy link

The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great but the docker compose does not work on Linux and it requires very recent version of docker-compose which is not explained nor checked (and error messages are difficult to understand in case of older versions used). Unfortunately the most popular ( and our main "target" Linux distribution does not use that version of docker-compose so we need to find a way to make the users at least be aware of the version requirement and direct them download page..

@github-actions github-actions bot removed the okay to merge It's ok to merge this PR as it does not require more tests label Jan 14, 2021
Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should use a .env file to populate some docker-compose values (which is passed to docker-compose automatically if in same dir). For example I did it like this: https://github.com/feluelle/finance-data-builder/blob/main/docker-compose.yml#L42-L46

And then I have a simple .env file like the following:

DBT_POSTGRES_HOST=fdb_dbt_db
DBT_POSTGRES_USER=dbt
DBT_POSTGRES_PASSWORD=dbt
DBT_POSTGRES_DB=dbt
DBT_POSTGRES_PORT=5432

AIRFLOW_POSTGRES_HOST=fdb_airflow_db
AIRFLOW_POSTGRES_USER=airflow
AIRFLOW_POSTGRES_PASSWORD=airflow
AIRFLOW_POSTGRES_DB=airflow
AIRFLOW_POSTGRES_PORT=5432

AIRFLOW_USER=airflow
AIRFLOW_PASSWORD=airflow

With this it is quite easy to change the access to the container and the airflow user.

docs/apache-airflow/start/docker-compose.yaml Outdated Show resolved Hide resolved
docs/apache-airflow/start/docker.rst Outdated Show resolved Hide resolved
@mik-laj
Copy link
Member Author

mik-laj commented Jan 18, 2021

With this it is quite easy to change the access to the container and the airflow user.

I would like this example to be simple and only use one file. This is a quick start for beginners, so we shouldn't use too complex syntax and require many steps. I will try to think of something to be able to introduce basic parameterization without adding extra steps to the instruction.

@potiuk
Copy link
Member

potiuk commented Jan 18, 2021

I would like this example to be simple and only use one file. This is a quick start for beginners, so we shouldn't use too complex syntax and require many steps. I will try to think of something to be able to introduce basic parameterization without adding extra steps to the instruction.

Agree with @mik-laj -> if we have a single file to use, this is soooo much easier to share and use via gists and other similar mechanisms, or simple links to file in the repo. I wish it was easy to do in our Dockerfiles, but this is rather complex there and you usually needs all the "surrounding" stuff like scripts, empty dirs etc.

@mik-laj mik-laj force-pushed the quick-start-docker branch from 2968044 to 664614d Compare January 24, 2021 07:35
@mik-laj
Copy link
Member Author

mik-laj commented Jan 24, 2021

I am wondering if we should use a .env file to populate some docker-compose values (which is passed to docker-compose automatically if in same dir).

I added a few parameters to the docker-compose file, but I was very careful not to complicate this file.
I only added these parameters in a few cases:

  • If it was necessary to ensure compatibility with Linux, (AIRFLOW_UID, AIRFLOW_GID),
  • if I thought it could be a really frequently changed option (AIRFLOW_IMAGE_NAME),
  • if it is a good example of using parameterization (_AIRFLOW_WWW_USER_USERNAME, _AIRFLOW_WWW_USER_PASSWORD)

Each variable has a default value defined so the .env file is optional. I also didn't define variables for internal details, e.g. database passwords, because Docker provides network isolation.

@mik-laj mik-laj requested a review from potiuk January 24, 2021 08:46
@mik-laj mik-laj force-pushed the quick-start-docker branch from ee8aba0 to fe736be Compare January 24, 2021 09:40
@potiuk
Copy link
Member

potiuk commented Jan 24, 2021

Looks good but I added a few fixups:

  1. Missing .gitignore entries for logs/plugins/env/dags
  2. The DOCKER_COMPOSE_ARGS are not needed IMHO . The docker compose run will allocate terminal as needed and you can disable it with -T flag. Also interactive mode works out-of-the-box with docker-compose-run. removed them
  3. I removed airflow from docker-compose run in airflow.sh. It is not neeed, it does not work with airfllow 2.0 image and it does not allow to run the two useful python and bash commands
  4. changed ${*} into "${@}" (that's the proper way of passing parameters containing spaces potentially)
  5. I added examples with "python/bash" commands.

@potiuk
Copy link
Member

potiuk commented Jan 24, 2021

Feel fre to modify them if you feel it can be done better.

@potiuk
Copy link
Member

potiuk commented Jan 24, 2021

And it works now on linux nicely :)

@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jan 24, 2021
in reality it is NOT a separate process, but run within the Scheduler.

* The **Worker(s)** are separate processes which also interact with the other components of the Airflow architecture and
the metadata repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are workers required in case of every executor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support only celery executor in this guide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed it but I could not found where we say it. In fact we say something totally different than CeleryExecutor:

Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an excellent starting point for understanding the architecture of Apache Airflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. And that's not my text. I only moved this paragraph from another place, but I will check it more carefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very beginning of this section it says this is the architecture for LocalExecutor and SeqentialExecutor, and both of these executors use a separate process to run the DAG code, so everything is correct.

Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an
excellent starting point for understanding the architecture of Apache Airflow.


airflow-init_1 | Upgrades done
airflow-init_1 | Admin user airflow created
airflow-init_1 | 2.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the version? Why 2.1.0?

Further down we mention the version again for example apache/airflow:2.0.0. This should be the same, or?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be Airflow 2.0.1, but it is not unreleased yet. At this point, I used this version because I was running the image from the master branch where we have version 2.1.0.dev, then deleted the suffix ".dev".

version = '2.1.0.dev0'

I am concerned that none of the released images works, as we have not yet voted on any release that includes the latest changes.
#13728
I will try to unify it so that there is one version used in both places.

Copy link
Member

@potiuk potiuk Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0.0 works with my fixup but without creating admin user

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added extra checks for docs build: 341c781

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
@mik-laj mik-laj merged commit ffb472c into apache:master Jan 26, 2021
@mik-laj mik-laj deleted the quick-start-docker branch January 26, 2021 12:45
kaxil pushed a commit that referenced this pull request Jan 27, 2021
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
(cherry picked from commit ffb472c)
@kaxil kaxil added this to the Airflow 2.0.1 milestone Jan 27, 2021
kaxil pushed a commit that referenced this pull request Feb 4, 2021
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
(cherry picked from commit ffb472c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools kind:documentation okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants