-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-3743] Unify different methods of working out AIRFLOW_HOME #4705
Conversation
I first noticed this due to differing settings between the config file and the AIRFLOW_HOME environment variable, meaning that the webserver_config.py would get written to one, but read from the other! |
I've got an import loop to sort out too (causing test failures, but only on python 3 curiously) |
f56b73d
to
7793c52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find my two cents.
category=DeprecationWarning, | ||
) | ||
else: | ||
AIRFLOW_HOME = conf.get('core', 'airflow_home') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of having this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-checking on this: so your intention is that if the user is still using airflow_home
in .cfg file, only a warning will be given (no hard stoping) and the value in .cfg file will be used eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was trying to make this change as easy as possible - warn but carry on using the setting (for 99% of people it will be the same)
Though the 1 case I know that this will break is the Puckel docker container. So perhaps if the two settings are different I should throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or throw an exception once conf.has_option('core', 'AIRFLOW_HOME') == True
?
Personally I think it's better to make it as explicit as possible, and having a clean cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing an exception in this case would be a breaking change (as this was in the default template, so every install would and up stopping working) which I am trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it’s not really necessary to have an if-else here to check whether AIRFLOW_HOME is in env? The two warning messages are not that different to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ashb , I had a double-check on this. Let me correct myself for this:
- I agree it's necessary to have the if-else
if 'AIRFLOW_HOME' in os.environ:
insideif conf.has_option('core', 'AIRFLOW_HOME'):
, and decide ifAIRFLOW_HOME
should be based on the value in .cfg file. - But I don't think it's necessary to have the warning messages for two times in the code. The two warning messages here are very similar to each other, and either of them will be invoked anyway. We can have something like
if conf.has_option('core', 'AIRFLOW_HOME'):
warnings.warn('<warning msg>', category=DeprecationWarning,)
if 'AIRFLOW_HOME' not in os.environ:
AIRFLOW_HOME = conf.get('core', 'airflow_home')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to:
if conf.has_option('core', 'AIRFLOW_HOME'):
msg = (
'Specifying both AIRFLOW_HOME environment variable and airflow_home '
'in the config file is deprecated. Please use only the AIRFLOW_HOME '
'environment variable and remove the config file entry.'
)
if 'AIRFLOW_HOME' in os.environ:
warnings.warn(msg, category=DeprecationWarning,)
elif conf.get('core', 'airflow_home') == AIRFLOW_HOME:
warnings.warn(
'Specifying airflow_home in the config file is deprecated. As you '
'have left it at the default value you should remove the setting '
'from your airflow.cfg and suffer no change in behaviour.',
category=DeprecationWarning,
)
else:
AIRFLOW_HOME = conf.get('core', 'airflow_home')
warnings.warn(msg, category=DeprecationWarning,)
I thought about adding an extra case do say which value it was
8b1576d
to
a29c991
Compare
@XD-DENG PTAnotherL :) |
a29c991
to
46f2c58
Compare
variable if you need to use a non default value for this. | ||
|
||
(Since this setting is used to calculate what config file to load, it is not | ||
possible to keep just the config option) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb I'm wondering what will happen if I have both env variables AIRFLOW_HOME
and AIRFLOW__CORE__AIRFLOW_HOME
, but set to different values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the code the second one is the same as setting it in the config file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having re-read this: this is the very problem I ran in to that caused me to issue the PR. If the are set to different values then the webserver_config.py would be looked for in one, but other things from the other.
Specifically the problem I had was that webserver_config.py was written to one, but read from the other! So airflow create_user
etc targeted an sqlite://:memory:
DB!
|
||
from sqlalchemy import create_engine, exc | ||
from sqlalchemy.orm import scoped_session, sessionmaker | ||
from sqlalchemy.pool import NullPool | ||
|
||
from airflow import configuration as conf | ||
from airflow.configuration import conf, AIRFLOW_HOME, WEBSERVER_CONFIG # NOQA F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If WEBSERVER_CONFIG
is not used here in airflow/settings.py
, why do we have to import it? Possibly I missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so we can do from airflow.settings import WEVSERVER_CONFIG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some basic questions... some of them are just because I didn't understand rather than feedback :)
c1b50fd
to
8a710b5
Compare
There were a few ways of getting the AIRFLOW_HOME directory used throughout the code base, giving possibly conflicting answer if they weren't kept in sync: - the AIRFLOW_HOME environment variable - core/airflow_home from the config - settings.AIRFLOW_HOME - configuration.AIRFLOW_HOME Since the home directory is used to compute the default path of the config file to load, specifying the home directory Again in the config file didn't make any sense to me, and I have deprecated that. This commit makes everything in the code base use `settings.AIRFLOW_HOME` as the source of truth, and deprecates the core/airflow_home config option. There was an import cycle form settings -> logging_config -> module_loading -> settings that needed to be broken on Python 2 - so I have moved all adjusting of sys.path in to the settings module (This issue caused me a problem where the RBAC UI wouldn't work as it didn't find the right webserver_config.py)
8a710b5
to
9ea1616
Compare
Codecov Report
@@ Coverage Diff @@
## master #4705 +/- ##
==========================================
- Coverage 75.76% 75.74% -0.03%
==========================================
Files 458 458
Lines 29856 29857 +1
==========================================
- Hits 22620 22614 -6
- Misses 7236 7243 +7
Continue to review full report at Codecov.
|
…4705) There were a few ways of getting the AIRFLOW_HOME directory used throughout the code base, giving possibly conflicting answer if they weren't kept in sync: - the AIRFLOW_HOME environment variable - core/airflow_home from the config - settings.AIRFLOW_HOME - configuration.AIRFLOW_HOME Since the home directory is used to compute the default path of the config file to load, specifying the home directory Again in the config file didn't make any sense to me, and I have deprecated that. This commit makes everything in the code base use `settings.AIRFLOW_HOME` as the source of truth, and deprecates the core/airflow_home config option. There was an import cycle form settings -> logging_config -> module_loading -> settings that needed to be broken on Python 2 - so I have moved all adjusting of sys.path in to the settings module (This issue caused me a problem where the RBAC UI wouldn't work as it didn't find the right webserver_config.py)
…pache#4705) There were a few ways of getting the AIRFLOW_HOME directory used throughout the code base, giving possibly conflicting answer if they weren't kept in sync: - the AIRFLOW_HOME environment variable - core/airflow_home from the config - settings.AIRFLOW_HOME - configuration.AIRFLOW_HOME Since the home directory is used to compute the default path of the config file to load, specifying the home directory Again in the config file didn't make any sense to me, and I have deprecated that. This commit makes everything in the code base use `settings.AIRFLOW_HOME` as the source of truth, and deprecates the core/airflow_home config option. There was an import cycle form settings -> logging_config -> module_loading -> settings that needed to be broken on Python 2 - so I have moved all adjusting of sys.path in to the settings module (This issue caused me a problem where the RBAC UI wouldn't work as it didn't find the right webserver_config.py)
…pache#4705) There were a few ways of getting the AIRFLOW_HOME directory used throughout the code base, giving possibly conflicting answer if they weren't kept in sync: - the AIRFLOW_HOME environment variable - core/airflow_home from the config - settings.AIRFLOW_HOME - configuration.AIRFLOW_HOME Since the home directory is used to compute the default path of the config file to load, specifying the home directory Again in the config file didn't make any sense to me, and I have deprecated that. This commit makes everything in the code base use `settings.AIRFLOW_HOME` as the source of truth, and deprecates the core/airflow_home config option. There was an import cycle form settings -> logging_config -> module_loading -> settings that needed to be broken on Python 2 - so I have moved all adjusting of sys.path in to the settings module (This issue caused me a problem where the RBAC UI wouldn't work as it didn't find the right webserver_config.py)
…pache#4705) There were a few ways of getting the AIRFLOW_HOME directory used throughout the code base, giving possibly conflicting answer if they weren't kept in sync: - the AIRFLOW_HOME environment variable - core/airflow_home from the config - settings.AIRFLOW_HOME - configuration.AIRFLOW_HOME Since the home directory is used to compute the default path of the config file to load, specifying the home directory Again in the config file didn't make any sense to me, and I have deprecated that. This commit makes everything in the code base use `settings.AIRFLOW_HOME` as the source of truth, and deprecates the core/airflow_home config option. There was an import cycle form settings -> logging_config -> module_loading -> settings that needed to be broken on Python 2 - so I have moved all adjusting of sys.path in to the settings module (This issue caused me a problem where the RBAC UI wouldn't work as it didn't find the right webserver_config.py)
Make sure you have checked all steps below.
Jira
https://issues.apache.org/jira/browse/AIRFLOW-3743
Description
There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:
Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.
This commit makes everything in the code base use
settings.AIRFLOW_HOME
as the source of truth, and deprecates thecore/airflow_home config option.
(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
Tests
Existing tests pass,.
Commits
Documentation
Code Quality
flake8