Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve naming conflict in case of multiple Airflow operators defined in single file #2399

Closed

Conversation

kiersten-stokes
Copy link
Member

@kiersten-stokes kiersten-stokes commented Jan 18, 2022

After several recent PRs that refactored the component catalog and parser functionality, a bug was introduced that does not persist a Component object in the case where multiple operators are present in the same file.

What changes were proposed in this pull request?

When parsing Airflow operators, a Component object is created for each operator defined in that file, and a component id is assigned. This component id is passed through from the relevant component catalog connector (recall that this class 1.) reads in the full python file, and 2.) builds a unique id based on the catalog connector type and the relevant unique-ifying keys as defined by the connector). This means that if multiple operators exist in a file, they are all getting assigned the same component id on init.

Component objects are added to the component cache keyed by their id. This results in the most recently parsed operator in a given file overwriting all previous operators also parsed from that file.

This PR appends the operator classname string to the id in the case of multiple operators defined in the same file. The id for a Component that is one of several in a single file would now have the format: <catalog-connector-type>:<hash-of-relevant-keys>:<class_name_of_operator>. Note that this would not require migration, as all our existing supported example operators have a 1:1 class:file ratio.

A related change is also included that prevents helper classes from being parsed as operators. Classes are only parsed if the class from which they derive ends in the phrase 'Operator'. See lines 96-102 in this PR.

How was this pull request tested?

  • Add a test case for this scenario

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@elyra-bot
Copy link

elyra-bot bot commented Jan 18, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@kiersten-stokes kiersten-stokes changed the title Resolve naming conflict in case of multiple operators Resolve naming conflict in case of multiple Airflow operators defined in single file Jan 18, 2022
@kiersten-stokes kiersten-stokes marked this pull request as draft January 18, 2022 22:29
@kiersten-stokes kiersten-stokes marked this pull request as ready for review January 19, 2022 18:11
@kiersten-stokes
Copy link
Member Author

Test test_airflow_component_parser:test_modify_component_catalogs is expected to fail right now because it access a resource via url that has been modified in this PR

@akchinSTC akchinSTC self-requested a review January 19, 2022 22:10
@ptitzler
Copy link
Member

I'm wondering if it's worthwhile to fix the old code base if we are going to switch to AST-based parsing in 3.6 ...

@akchinSTC akchinSTC marked this pull request as draft January 24, 2022 22:47
@kiersten-stokes
Copy link
Member Author

Closing as this functionality is now covered by #2418

@kiersten-stokes kiersten-stokes deleted the airflow-naming-bug branch April 1, 2022 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants