Skip to content

Conversation

@ahmadtfarhan
Copy link
Contributor

@ahmadtfarhan ahmadtfarhan commented Mar 6, 2025


New Provider for Gremlin (Apache TinkerPop)

Summary

This PR introduces a new provider for Apache TinkerPop. The provider is designed to facilitate connections to graph databases supporting Gremlin, such as Azure Cosmos DB and Amazon Neptune.

Background

Previously, I had developed a custom hook to query data from Azure Cosmos DB using Gremlin graph query language. In this PR, I’ve consolidated that functionality into a full provider.

Changes

Provider Implementation:

  • The new GremlinHook establishes a connection using the Gremlin Python Client.
  • It constructs the connection URI based on the Airflow Connection’s properties and extra parameters.
  • The hook exposes a run(query: str) method that submits raw Gremlin query strings and returns the results.

Operator:

  • The GremlinOperator instantiates the GremlinHook with a given connection ID and query, then calls run(query) to execute the query.

Testing:

  • Unit tests have been added to validate the URI construction, client instantiation, and query execution.
  • Manual tests confirmed that raw queries using Client.submit() successfully return results from Cosmos DB using Gremlin.
  • Integration testing created and tested on CI validating the the use of Gremlin Server with basic data insertion and querying.

Testing & Validation

  • I ran sample DAG on the main branch.
  • The provider has been validated locally by submitting queries such as g.V() and verifying that the expected results are returned from Azure Cosmos DB.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Apr 23, 2025

Approved. Let me run it one more time. Sorry to keep you watiing @ahmadtfarhan

@potiuk
Copy link
Member

potiuk commented Apr 23, 2025

Very, very good job with the integration, docs and the change

@ahmadtfarhan
Copy link
Contributor Author

Very, very good job with the integration, docs and the change

Thanks!

@ahmadtfarhan
Copy link
Contributor Author

Approved. Let me run it one more time. Sorry to keep you watiing @ahmadtfarhan

No worries! Let me know how it goes.

@potiuk potiuk merged commit 5349d09 into apache:main Apr 24, 2025
94 of 95 checks passed
@spmallette
Copy link

great to see this merged - thanks everyone!

prabhusneha pushed a commit to astronomer/airflow that referenced this pull request Apr 25, 2025
* tests passed

* narrow down CI

* fix host

* add drill for testing

* add docker inspect

* create a funtion

* minor changes

* change the inspect

* add status condition

* change host to zeros

* rewrite the pipeline

* push image and reuse

* remove tag

* rename tag

* change to locally pushed image

* add sha to image name

* add sha to image name

* add debug and login

* change token name

* change token name

* remove login

* remove login

* remove login

* move token to hight level

* copy from ci

* change to secrets

* test workflow

* test workflow

* rerun on CI login

* rerun on CI login

* logout

* revert logout

* revert login

* remove cleanup

* readd cleanup

* change docker pass

* add login to action

* move up

* chage token

* one job

* one job

* remove action

* delete if condition

* rename image

* fix tar name

* fix tar name

* fix tar name

* remove status

* remove stdin

* add logs and hostname

* removed restart

* add chmod

* create entry sh

* create entry sh

* add status

* remove failure

* add user

* revert back all ci yamls

* revert back shell script

* fix tinkerpop

* change back to gremlin in the providers list

* change docs

* change docs

* add conn_name_attr

* move system test

* change to cap

* change to 1.0.0

* fix sh

* removed serializer and some changes

* fixing docs

* update provider

* change to 2.9.0 and add spell check

* ran breeze release management

* add doc strings

* add asterisk

* moved operators doc

* fixed docs

* fixed pyproject

* minor change to docs

* add conf.py

* fixed docs

* fix toml

* changed to 2.10

* remove fab from tinkerpop

* fix prov info

* add close method

* fix integration

* change gremlin host

* change gremlin host back

* remove async

* remove package

* fix pyproject

* fix test

---------

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants