You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently DAGs are retrieving sensitive information from several secrets.py files. This pattern is meant to keep sensitive data out of the git repository (but beware there are some secrets lying around in the git history for this repo) but is not very friendly for setting up new environments (dev or otherwise). For example: I'm creating a new dev environment locally and will have to create several of these secrets files with dummy values just be able to get airflow up and running.
Also, this pattern of retrieving the secrets is not very dev friendly:
frommyfileimportsecretsvar1=secrets["secret1"]
This forces me to create the secrets.py file AND also create a secrets dictionary AND also create the secret1 key in the dict with some value. Otherwise I cannot get airflow to run. And I have to repeat this for all DAGs even if I'm only interested in working on a single DAG.
A more flexible strategy is proposed in the 12factor app's section on configuration. Basically it is recommended that this information be kept in the environment and not in custom python files.
In this case it would mean that, instead of several secrets.py files each holding a bunch of dictionaries with keys and strings as values, there would be several environment variables, one for each secret variable. Airflow even facilitates using this pattern for things like database connections via its connections feature.
The pattern of retrieving the secrets can also be made more flexible:
importosvar1=os.getenv("SECRET1") # defaults to None if SECRET1 does not exist in environment# optionally you can specify some sensible default too# var1 = os.getenv("SECRET1", "some_default_value")
The snippet above allows me to define just the secrets that I want to use and the code will not blow up (immediatly at least) if the other secrets are not defined.
As for the definition of the environment variables, they can be kept in a single file, which can be specific to each env, for example dev.env, staging.env, production.env. This file can be something like:
# dev.env
SECRET1=my_secret
SECRET2=other_secret
The contents of the file can then be exported to the environment using:
set -o allexport
source dev.env
set +o allexport
Or, if using docker, the docker run command supports an --env-file argument where we can specify the file.
These env files would not usually be kept in the code repository with the eventual exception of the dev file, which might make sense to keep in the repo, if it facilitates dev's setup and does not contain any truly sensitive information (for example if it uses only local database credentials)
The text was updated successfully, but these errors were encountered:
Currently DAGs are retrieving sensitive information from several
secrets.py
files. This pattern is meant to keep sensitive data out of the git repository (but beware there are some secrets lying around in the git history for this repo) but is not very friendly for setting up new environments (dev or otherwise). For example: I'm creating a new dev environment locally and will have to create several of these secrets files with dummy values just be able to get airflow up and running.Also, this pattern of retrieving the secrets is not very dev friendly:
This forces me to create the
secrets.py
file AND also create asecrets
dictionary AND also create thesecret1
key in the dict with some value. Otherwise I cannot get airflow to run. And I have to repeat this for all DAGs even if I'm only interested in working on a single DAG.A more flexible strategy is proposed in the 12factor app's section on configuration. Basically it is recommended that this information be kept in the environment and not in custom python files.
In this case it would mean that, instead of several
secrets.py
files each holding a bunch of dictionaries with keys and strings as values, there would be several environment variables, one for each secret variable. Airflow even facilitates using this pattern for things like database connections via its connections feature.The pattern of retrieving the secrets can also be made more flexible:
The snippet above allows me to define just the secrets that I want to use and the code will not blow up (immediatly at least) if the other secrets are not defined.
As for the definition of the environment variables, they can be kept in a single file, which can be specific to each env, for example
dev.env
,staging.env
,production.env
. This file can be something like:# dev.env SECRET1=my_secret SECRET2=other_secret
The contents of the file can then be exported to the environment using:
Or, if using docker, the
docker run
command supports an--env-file
argument where we can specify the file.These env files would not usually be kept in the code repository with the eventual exception of the dev file, which might make sense to keep in the repo, if it facilitates dev's setup and does not contain any truly sensitive information (for example if it uses only local database credentials)
The text was updated successfully, but these errors were encountered: