-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quick start for Airflow on Docker #13660
Conversation
The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest master or amend the last commit of the PR, and push it with --force-with-lease. |
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great but the docker compose does not work on Linux and it requires very recent version of docker-compose which is not explained nor checked (and error messages are difficult to understand in case of older versions used). Unfortunately the most popular ( and our main "target" Linux distribution does not use that version of docker-compose so we need to find a way to make the users at least be aware of the version requirement and direct them download page..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should use a .env
file to populate some docker-compose values (which is passed to docker-compose
automatically if in same dir). For example I did it like this: https://github.com/feluelle/finance-data-builder/blob/main/docker-compose.yml#L42-L46
And then I have a simple .env
file like the following:
DBT_POSTGRES_HOST=fdb_dbt_db
DBT_POSTGRES_USER=dbt
DBT_POSTGRES_PASSWORD=dbt
DBT_POSTGRES_DB=dbt
DBT_POSTGRES_PORT=5432
AIRFLOW_POSTGRES_HOST=fdb_airflow_db
AIRFLOW_POSTGRES_USER=airflow
AIRFLOW_POSTGRES_PASSWORD=airflow
AIRFLOW_POSTGRES_DB=airflow
AIRFLOW_POSTGRES_PORT=5432
AIRFLOW_USER=airflow
AIRFLOW_PASSWORD=airflow
With this it is quite easy to change the access to the container and the airflow user.
I would like this example to be simple and only use one file. This is a quick start for beginners, so we shouldn't use too complex syntax and require many steps. I will try to think of something to be able to introduce basic parameterization without adding extra steps to the instruction. |
Agree with @mik-laj -> if we have a single file to use, this is soooo much easier to share and use via gists and other similar mechanisms, or simple links to file in the repo. I wish it was easy to do in our Dockerfiles, but this is rather complex there and you usually needs all the "surrounding" stuff like scripts, empty dirs etc. |
TODO: I would like to add optional sections on how to use CLI with wrappers. This in some cases will allow us to forget that everything is running in Docker. |
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2968044
to
664614d
Compare
I added a few parameters to the docker-compose file, but I was very careful not to complicate this file.
Each variable has a default value defined so the .env file is optional. I also didn't define variables for internal details, e.g. database passwords, because Docker provides network isolation. |
ee8aba0
to
fe736be
Compare
Looks good but I added a few fixups:
|
Feel fre to modify them if you feel it can be done better. |
And it works now on linux nicely :) |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease. |
in reality it is NOT a separate process, but run within the Scheduler. | ||
|
||
* The **Worker(s)** are separate processes which also interact with the other components of the Airflow architecture and | ||
the metadata repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are workers required in case of every executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support only celery executor in this guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed it but I could not found where we say it. In fact we say something totally different than CeleryExecutor:
Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an excellent starting point for understanding the architecture of Apache Airflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. And that's not my text. I only moved this paragraph from another place, but I will check it more carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very beginning of this section it says this is the architecture for LocalExecutor and SeqentialExecutor, and both of these executors use a separate process to run the DAG code, so everything is correct.
Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an
excellent starting point for understanding the architecture of Apache Airflow.
docs/apache-airflow/start/docker.rst
Outdated
|
||
airflow-init_1 | Upgrades done | ||
airflow-init_1 | Admin user airflow created | ||
airflow-init_1 | 2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the version? Why 2.1.0
?
Further down we mention the version again for example apache/airflow:2.0.0
. This should be the same, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be Airflow 2.0.1, but it is not unreleased yet. At this point, I used this version because I was running the image from the master branch where we have version 2.1.0.dev, then deleted the suffix ".dev".
Line 41 in f473ca7
version = '2.1.0.dev0' |
I am concerned that none of the released images works, as we have not yet voted on any release that includes the latest changes.
#13728
I will try to unify it so that there is one version used in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0.0 works with my fixup but without creating admin user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added extra checks for docs build: 341c781
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> (cherry picked from commit ffb472c)
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> (cherry picked from commit ffb472c)
Hello,
Adds a docker-compose file that contains a simple sample configuration that allows you to quickly run Airflow in a Docker environment. This is a part of #8605 (comment), but much simpler because .... novice users don't care at all about their environment, but need to get all components up and running quickly. In the next steps, we can work on adding wizards that will allow you to generate your own docker-compose file, but I think it will require some dependent documentation and this is not needed in this section of the documentation yet.
Part of: #8605
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.