Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD] Add airflow package catalog connector #2409

Closed
wants to merge 61 commits into from
Closed

[HOLD] Add airflow package catalog connector #2409

wants to merge 61 commits into from

Conversation

ptitzler
Copy link
Member

@ptitzler ptitzler commented Jan 21, 2022

This PR adds a catalog connector for Apache Airflow packages. Connector instances (of which there should typically be only one) require the user to configure a download URL for the Apache Airflow package that is used in the cluster.

Requires #2418

What changes were proposed in this pull request?

  • Add new catalog-connectors directory to the repository root, containing in the airflow subdirectory the newly introduced connector
  • Update Makefile to include new lint-connectors target, which was also added as a dependency to the lint task
  • Add new lint-connectors task to Github's build.yaml
  • Update installation topic in documentation
  • Add connector to extras install dependency in setup.py (The connector is also installed if one runs pip install elyra[all])

Notes:

  • The connector is not included in the Elyra release process and needs to be published independently, as necessary.

  • The connector declares file name (e.g. operators/bash_operator.py) as hash keys, which are used to internally identify operators in the palette. The archive name, e.g.apache_airflow-1.10.15-py2.py3-none-any.whl, is currently not part of the key to avoid potential versioning issues. For example, assume user A adds operators from archive apache_airflow-1.10.15-py2.py3-none-any.whl to the Elyra deployment and creates a pipeline using some of the operators. User B adds operators from an older archive, such as apache_airflow-1.10.12-py2.py3-none-any.whl . If we were to include the archive name as is as a key, user B would not be able to run pipelines that user A created (and vice versa) because (pseudo code)

    "apache_airflow-1.10.15-py2.py3-none-any.whl:operators/bash_operator.py:BashOperator" != "apache_airflow-1.10.12-py2.py3-none-any.whl:operators/bash_operator.py:BashOperator"
    

    We need to decide whether the implemented behavior is sufficient (archive version numbers are completely ignored, even though this might lead to issues if the loaded operator signatures in Elyra's pipeline editor are significantly different from those of the operators that are installed in the Airflow cluster) or if semver support is required. The latter could be accomplished by using only parts of the archive name as key, e.g. by omitting/masking minor and patch version numbers. It does require though that archive names follow a constant naming pattern to allow for the extraction of version strings.

How was this pull request tested?

  • Install connector from source, as documented in the connector's README
  • Enable the connector as documented in the connector's README
  • Review the installation topic in the 'getting started' guide

Notes:

  • There are unresolved Elyra Airflow component parser issues that need to be addressed before the Airflow 1.10.15 package can be used.
  • The connector should already support Airflow 2.x packages but they have not been tested because Elyra does not support Airflow 2.x.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

kiersten-stokes and others added 30 commits September 24, 2021 16:13
Speed up reading of component definition by parallelizing 
associated method calls for each path in a registry and some
minor refactoring of component registry related functions
@ptitzler ptitzler added platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime and removed status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. labels Jan 24, 2022
@ptitzler
Copy link
Member Author

We need to decide whether the implemented behavior is sufficient [...] or if semver support is required. The latter could be accomplished by using only parts of the archive name as key

Wheel package names appear to be consistent (random selection of releases):

  • apache_airflow-1.10.12-py2.py3-none-any.whl
  • apache_airflow-1.10.15-py2.py3-none-any.whl
  • apache_airflow-2.0.2-py3-none-any.whl
  • apache_airflow-2.1.2-py3-none-any.whl
  • apache_airflow-2.2.2-py3-none-any.whl
  • apache_airflow-2.2.3-py3-none-any.whl

@akchinSTC akchinSTC changed the title Add airflow package catalog connector [HOLD] Add airflow package catalog connector Jan 26, 2022
Co-authored-by: Martha Cryan <martha.cryan@ibm.com>
@ptitzler
Copy link
Member Author

ptitzler commented Feb 1, 2022

Clarification based on today's discussion in the dev meeting. The proposed PR implementation is based on the following assumptions:

  • the connector requires it's on life-cycle, meaning updates can be published without the need for a new Elyra release
  • the connector is treated as an optional Elyra component that is only installed when users deploy Elyra with Apache Airflow pipeline support

@ptitzler
Copy link
Member Author

ptitzler commented Feb 2, 2022

Will redo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:enhancement New feature or request platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants