-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: use standard imports for operators/hooks #1238
Comments
+1 |
1 similar comment
+1 |
+1 also:
but:
|
@mistercrunch excellent point. I think we can take advantage of the fact that the objects imported into |
Sounds perfect! |
It works! see: jlowin@973295c
|
Yep I currently do this for |
Currently, Airflow uses a relatively complex import mechanism to import hooks and operators without polluting the namespace with submodules. I would like to propose that Airflow abandon that system and use standard Python importing.
Here are a few major reasons why I think the current system has run its course.
Polluting namespace
The biggest advantage of the current system, as I understand it, is that only Operators appear in the
airflow.operators
namespace. The submodules that actually contain the operators do not.So for example while
airflow.operators.python_operator.PythonOperator
is a thing,PythonOperator
is in theairflow.operators
namespace butpython_operator
is not.I think this sort of namespace pollution was helpful when Airflow was a smaller project, but as the number of hooks/operators grows -- and especially as the
contrib
hooks/operators grow -- I'd argue that namespacing is a good thing. It provides structure and organization, and opportunities for documentation (through module docstrings).In fact, I'd argue that the current namespace is itself getting quite polluted -- the only way to know what's available is to use something like Ipython tab-completion to browse an alphabetical list of Operator names, or to load the source file and grok the import definition (which no one installing from pypi is likely to do).
Conditional imports
There's a second advantage to the current system that any module that fails to import is silently ignored. It makes it easy to have optional dependencies. For example, if someone doesn't have
boto
installed, then they don't have anS3Hook
either. Same for a HiveOperatorAgain, as Airflow grows and matures, I think this is a little too magic. If my environment is missing a dependency, I want to hear about it.
On the other hand, the
contrib
namespace sort of depends on this -- we don't want users to have to install every single dependency. So I propose that contrib modules all live in their submodules:from airflow.contrib.operators.my_operator import MyOperator
. As mentioned previously, having structure and namespacing is a good thing as the project gets more complex.Other ways to handle this include putting "non-standard" dependencies inside the operator/hook rather than the module (see
HiveOperator
/HiveHook
), so it can be imported but not used. Another is judicious use oftry
/except ImportError
. The simplest is to make people import things explicitly from submodules.Operator dependencies
Right now, operators can't depend on each other if they aren't in the same file. This is for the simple reason that there is no guarantee on what order the operators will be loaded. It all comes down to which dictionary key gets loaded first. One day Operator B could be loaded after Operator A; the next day it might be loaded before. Consequently, A and B can't depend on each other. Worse, if a user makes two operators that do depend on each other, they won't get an error message when one fails to import.
For contrib modules in particular, this is sort of killer.
Ease of use
It's hard to set up imports for a new operator. The dictionary-based import instructions aren't obvious for new users, and errors are silently dismissed which makes debugging difficult.
Identity
Surprisingly,
airflow.operators.SubDagOperator != airflow.operators.subdag_operator.SubDagOperator
. See #1168.Proposal
Use standard python importing for hooks/operators/etc.
__init__.py
files use straightforward, standard Python importsairflow.operators.OperatorName
orairflow.operators.operator_module.OperatorName
.airflow.contrib.operators.operator_module.OperatorName
in order to manage dependencies__all__
to define their module's exportsPossibly delete namespace afterward
operators/__init__.py
, run a function at the end of the file which deletes all modules from the namespace, leaving onlyOperators
. This keeps the namespace clear but lets people use familiar import mechanisms.Possibly use an import function to handle
ImportError
gracefullyimport_module_attrs
to take one module name at a time instead of a dictionary.The text was updated successfully, but these errors were encountered: