Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD] Add Airflow provider package catalog connector #2416

Closed
wants to merge 56 commits into from
Closed

[HOLD] Add Airflow provider package catalog connector #2416

wants to merge 56 commits into from

Conversation

ptitzler
Copy link
Member

@ptitzler ptitzler commented Jan 24, 2022

This PR adds a catalog connector for Apache Airflow provider packages. Connector instances require the user to configure a download URL for the Apache Airflow provider package that is installed in the cluster.

Requires #2418, #2409

What changes were proposed in this pull request?

  • Add new catalog-connectors directory to the repository root, containing in the airflow subdirectory the newly introduced connector
  • Update Makefile to include new lint-connectors target, which was also added as a dependency to the lint task
  • Add new lint-connectors task to Github's build.yaml

Notes:

  • The connector is not included in the Elyra release process and needs to be published independently, as necessary.
  • The connector declares provider archive name (e.g. apache_airflow_providers_ssh - see later comment) and Python file name (e.g. airflow/providers/ssh/operators/ssh.py) as hash keys, which are used to internally identify operators in the palette.
  • The archive version string, e.g.2.3.0-py3-none-any, is currently not part of the key to avoid potential versioning issues. For example, assume user A adds operators from archive ...2.3.0-py3-none-any to the Elyra deployment and creates a pipeline using some of the operators. User B adds operators from an older archive, such as ...2.2.0-py3-none-any . If we were to include the archive name as is as a key, user B would not be able to run pipelines that user A created (and vice versa) because (pseudo code)
    "apache_airflow_providers_ssh-2.3.0-py3-none-any.whl:airflow/providers/ssh/operators/ssh.py:SSHOperator" != "apache_airflow_providers_ssh-2.2.0-py3-none-any.whl:airflow/providers/ssh/operators/ssh.py:SSHOperator"
    

How was this pull request tested?

  • Install connector from source, as documented in the connector's README
  • Enable the connector as documented in the connector's README, specifying one provider package
  • Open VPE for Airflow
  • Expand palette (core Airflow operators should be displayed)
  • Add and configure operators
  • Export pipeline and review DAG
  • Run pipeline

Unit testing included the providers listed in this discussion thread

Notes:

  • There are unresolved Elyra Airflow component parser issues that need to be addressed before the Airflow 1.10.15 package can be used. Improve Airflow parser functionality #2418
  • The connector should already support Airflow 2.x packages but they have not been tested because Elyra does not support Airflow 2.x.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

kiersten-stokes and others added 30 commits September 24, 2021 16:13
Speed up reading of component definition by parallelizing 
associated method calls for each path in a registry and some
minor refactoring of component registry related functions
@ptitzler ptitzler marked this pull request as draft January 24, 2022 23:48
@elyra-bot
Copy link

elyra-bot bot commented Jan 24, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@ptitzler ptitzler added kind:enhancement New feature or request platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. labels Jan 24, 2022
@ptitzler ptitzler removed the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Jan 28, 2022
@ptitzler ptitzler marked this pull request as ready for review January 28, 2022 23:37
@ptitzler ptitzler added this to the 3.6.0 milestone Jan 28, 2022
@ptitzler ptitzler changed the title Add Airflow provider package catalog connector [HOLD] Add Airflow provider package catalog connector Jan 31, 2022
@ptitzler
Copy link
Member Author

ptitzler commented Feb 1, 2022

Clarification based on today's discussion in the dev meeting. The proposed PR implementation is based on the following assumptions:

  • the connector requires it's on life-cycle, meaning updates can be published without the need for a new Elyra release
  • the connector is treated as an optional Elyra component that is only installed when users deploy Elyra with Apache Airflow pipeline support

@ptitzler
Copy link
Member Author

ptitzler commented Feb 2, 2022

Will redo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:enhancement New feature or request platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants