Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3743] Unify different methods of working out AIRFLOW_HOME #4705

Merged
merged 1 commit into from
Mar 25, 2019

Conversation

ashb
Copy link
Member

@ashb ashb commented Feb 13, 2019

Make sure you have checked all steps below.

Jira

https://issues.apache.org/jira/browse/AIRFLOW-3743

Description

There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

  • the AIRFLOW_HOME environment variable
  • core/airflow_home from the config
  • settings.AIRFLOW_HOME
  • configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
settings.AIRFLOW_HOME as the source of truth, and deprecates the
core/airflow_home config option.

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)

Tests

Existing tests pass,.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@ashb
Copy link
Member Author

ashb commented Feb 13, 2019

I first noticed this due to differing settings between the config file and the AIRFLOW_HOME environment variable, meaning that the webserver_config.py would get written to one, but read from the other!

airflow/configuration.py Outdated Show resolved Hide resolved
airflow/configuration.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member Author

ashb commented Feb 13, 2019

I've got an import loop to sort out too (causing test failures, but only on python 3 curiously)

@ashb ashb force-pushed the unify_airflow_home branch 4 times, most recently from f56b73d to 7793c52 Compare February 14, 2019 15:24
Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find my two cents.

UPDATING.md Outdated Show resolved Hide resolved
UPDATING.md Outdated Show resolved Hide resolved
UPDATING.md Outdated Show resolved Hide resolved
category=DeprecationWarning,
)
else:
AIRFLOW_HOME = conf.get('core', 'airflow_home')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of having this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checking on this: so your intention is that if the user is still using airflow_home in .cfg file, only a warning will be given (no hard stoping) and the value in .cfg file will be used eventually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was trying to make this change as easy as possible - warn but carry on using the setting (for 99% of people it will be the same)

Though the 1 case I know that this will break is the Puckel docker container. So perhaps if the two settings are different I should throw an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or throw an exception once conf.has_option('core', 'AIRFLOW_HOME') == True?

Personally I think it's better to make it as explicit as possible, and having a clean cut.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an exception in this case would be a breaking change (as this was in the default template, so every install would and up stopping working) which I am trying to avoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it’s not really necessary to have an if-else here to check whether AIRFLOW_HOME is in env? The two warning messages are not that different to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashb , I had a double-check on this. Let me correct myself for this:

  • I agree it's necessary to have the if-else if 'AIRFLOW_HOME' in os.environ: inside if conf.has_option('core', 'AIRFLOW_HOME'):, and decide if AIRFLOW_HOME should be based on the value in .cfg file.
  • But I don't think it's necessary to have the warning messages for two times in the code. The two warning messages here are very similar to each other, and either of them will be invoked anyway. We can have something like
if conf.has_option('core', 'AIRFLOW_HOME'):
    warnings.warn('<warning msg>', category=DeprecationWarning,)

    if 'AIRFLOW_HOME' not in os.environ:
        AIRFLOW_HOME = conf.get('core', 'airflow_home')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to:

if conf.has_option('core', 'AIRFLOW_HOME'):
    msg = (
        'Specifying both AIRFLOW_HOME environment variable and airflow_home '
        'in the config file is deprecated. Please use only the AIRFLOW_HOME '
        'environment variable and remove the config file entry.'
    )
    if 'AIRFLOW_HOME' in os.environ:
        warnings.warn(msg, category=DeprecationWarning,)
    elif conf.get('core', 'airflow_home') == AIRFLOW_HOME:
        warnings.warn(
            'Specifying airflow_home in the config file is deprecated. As you '
            'have left it at the default value you should remove the setting '
            'from your airflow.cfg and suffer no change in behaviour.',
            category=DeprecationWarning,
        )
    else:
        AIRFLOW_HOME = conf.get('core', 'airflow_home')
        warnings.warn(msg, category=DeprecationWarning,)

I thought about adding an extra case do say which value it was

@ashb ashb force-pushed the unify_airflow_home branch 2 times, most recently from 8b1576d to a29c991 Compare March 22, 2019 17:30
@ashb
Copy link
Member Author

ashb commented Mar 22, 2019

@XD-DENG PTAnotherL :)

variable if you need to use a non default value for this.

(Since this setting is used to calculate what config file to load, it is not
possible to keep just the config option)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb I'm wondering what will happen if I have both env variables AIRFLOW_HOME and AIRFLOW__CORE__AIRFLOW_HOME , but set to different values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the code the second one is the same as setting it in the config file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having re-read this: this is the very problem I ran in to that caused me to issue the PR. If the are set to different values then the webserver_config.py would be looked for in one, but other things from the other.

Specifically the problem I had was that webserver_config.py was written to one, but read from the other! So airflow create_user etc targeted an sqlite://:memory: DB!


from sqlalchemy import create_engine, exc
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.pool import NullPool

from airflow import configuration as conf
from airflow.configuration import conf, AIRFLOW_HOME, WEBSERVER_CONFIG # NOQA F401
Copy link
Member

@XD-DENG XD-DENG Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If WEBSERVER_CONFIG is not used here in airflow/settings.py, why do we have to import it? Possibly I missed something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so we can do from airflow.settings import WEVSERVER_CONFIG

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some basic questions... some of them are just because I didn't understand rather than feedback :)

UPDATING.md Outdated Show resolved Hide resolved
airflow/__init__.py Show resolved Hide resolved
airflow/configuration.py Show resolved Hide resolved
airflow/lineage/__init__.py Show resolved Hide resolved
airflow/plugins_manager.py Outdated Show resolved Hide resolved
@ashb ashb force-pushed the unify_airflow_home branch 4 times, most recently from c1b50fd to 8a710b5 Compare March 24, 2019 13:36
There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
@codecov-io
Copy link

Codecov Report

Merging #4705 into master will decrease coverage by 0.02%.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4705      +/-   ##
==========================================
- Coverage   75.76%   75.74%   -0.03%     
==========================================
  Files         458      458              
  Lines       29856    29857       +1     
==========================================
- Hits        22620    22614       -6     
- Misses       7236     7243       +7
Impacted Files Coverage Δ
airflow/utils/module_loading.py 100% <ø> (ø) ⬆️
airflow/utils/dag_processing.py 59.4% <0%> (-0.18%) ⬇️
airflow/utils/log/file_processor_handler.py 86.11% <100%> (ø) ⬆️
airflow/lineage/__init__.py 96.55% <100%> (-0.06%) ⬇️
airflow/bin/cli.py 67% <100%> (ø) ⬆️
airflow/__init__.py 95.83% <100%> (-0.17%) ⬇️
airflow/plugins_manager.py 86.91% <100%> (+0.07%) ⬆️
airflow/settings.py 84.25% <100%> (+1.64%) ⬆️
airflow/www/app.py 97.41% <100%> (-0.03%) ⬇️
airflow/logging_config.py 97.5% <100%> (-0.07%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d911538...9ea1616. Read the comment docs.

@ashb ashb merged commit 1c43cde into apache:master Mar 25, 2019
@ashb ashb deleted the unify_airflow_home branch March 25, 2019 11:10
ashb added a commit that referenced this pull request Mar 25, 2019
…4705)

There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
cthenderson pushed a commit to cthenderson/apache-airflow that referenced this pull request Apr 16, 2019
…pache#4705)

There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
…pache#4705)

There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
…pache#4705)

There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants