-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Airflow parser functionality #2418
Improve Airflow parser functionality #2418
Conversation
Thanks for making a pull request to Elyra! To try out this branch on binder, follow this link: |
A few thoughts about the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. There's a general question/comment about how the "teardown" fixture is declared (and used), but everything else is fairly specific (and minor).
Note, I don't really have time to fully test this out but hoping others are able to spend time on this in that respect.
Co-authored-by: Kevin Bates <kbates4@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank Kiersten - these changes look good.
Fixes portions of #2414
Supports #2437 & #2438
Functionality Summary:
What changes were proposed in this pull request?
This PR refactors the
AirflowComponentParser
to useast.parse
to cover additional parsing cases cleanly. General flow is the same as before: the parser takes the contents of a file, parses it for all its classes, then parses each class for its properties (i.e. init arguments).What is different is the method by which is does those things. Rather than using regexes,
ast.parse
is used to get the abstract syntax tree (AST) of the file. This provides the class definitions and their corresponding init function (including argument objects) and docstring for further parsing. Annoyingly,ast.parse
produces slightly different data structures for different python versions (Python 3.7 and lower vs Python 3.8+) - the relevant cases are currently handled to the best of my knowledge. The class definitions are filtered to ensure that they extend (directly or indirectly) from theBaseOperator
. For operators that extend from a class that is not defined in the same file, it is sufficient to check the import statement for operators defined in eitherairflow.providers.*.operators.
orairflow.operators.*
(see this link for details). As before, classes are looped through to provide a list of properties for each (represented by the init argument objects and any additional information gleaned from the operator class docstring parsed earlier). Note that regexes are still used to search the class docstring for 1. a description of the property, and 2. any type information available in the docstring (which is often more verbose and therefore more helpful than the type information parsed from AST).This PR doesn't (yet) touch any of the processor classes.
Some discussion points:
Do we want to/how should we handle cases where arguments are of 'non-standard' data types, e.g. theThis will be addressed in a follow-up PR as it falls under the umbrella of 'new functionality' and is already tracked in Address shortcomings in the Airflow parser and processor code #2414PythonOperator
specifies an argument of the type 'python callable' (which will default to string)Similar to the above, are there cases where an operator may not have an init function but we still want to include them as a Component? (again see some of the classes defined inI propose that we handle this issue in another PR; this is essentially the same case as determining how to support operators that derive from other operator classes (which is already tracked in Address shortcomings in the Airflow parser and processor code #2414)python_operator.py
) (different from the case where init does not have any arguments)Left to do:
ast.parse
(and/or an even larger section)How was this pull request tested?
ast.parse
behavior can vary slightly between versionsOne additional change made to the tests includes a new fixture that tears down the test component catalog instance created in certain tests (
test_modify_component_catalogs
, e.g.). In this way, if a test failure occurs before the test catalog can be removed, there are no issues in tests that occur later down the line.Developer's Certificate of Origin 1.1