Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ArangoDB Provider #22548

Merged
merged 17 commits into from
Apr 3, 2022
Merged

Adding ArangoDB Provider #22548

merged 17 commits into from
Apr 3, 2022

Conversation

pateash
Copy link
Contributor

@pateash pateash commented Mar 27, 2022

closes: #17778


Description

Adding ArangoDB provider based on Python SDK https://github.com/ArangoDB-Community/python-arango

Users can create their own custom operators leveraging the ArangoDBHook directly
or building their operator on AQLOperator by providing result_processor method,

operator = AQLOperator(
    task_id='aql_operator',
    sql="FOR doc IN students " \
        "RETURN doc",
    dag=dag,
    result_processor=lambda cursor: print([document["name"] for document in cursor])
)

Sensor can be implemented by SQL

sensor = AQLSensor(
    task_id="aql_sensor",
    sql="FOR doc IN students " \
        "FILTER doc.name == 'judy' " \
        "RETURN doc",
    timeout=60,
    poke_interval=10,
    dag=dag,
)

@pateash pateash changed the title Add Arango hook WIP: Add Arango hook Mar 27, 2022
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all LGTM

nice job @pateash

:param arangodb_conn_id: Reference to :ref:`ArangoDB connection id <howto/connection:arangodb>`.
"""

template_fields: Sequence[str] = ('sql',)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add also template_ext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, make sense
added

:param arangodb_db: Target ArangoDB name.
"""

template_fields: Sequence[str] = ('sql',)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add also template_ext?

Copy link
Contributor Author

@pateash pateash Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

airflow/providers/arangodb/hooks/arangodb.py Show resolved Hide resolved
@potiuk
Copy link
Member

potiuk commented Mar 27, 2022

Can you fix static checks please.

@pateash
Copy link
Contributor Author

pateash commented Mar 29, 2022

image

@pateash
Copy link
Contributor Author

pateash commented Mar 29, 2022

image

@pateash pateash changed the title WIP: Add Arango hook Adding ArangoDB Provider Mar 29, 2022
@pateash pateash requested a review from eladkal March 29, 2022 17:30
@pateash pateash closed this Mar 29, 2022
@pateash pateash reopened this Mar 29, 2022
@eladkal
Copy link
Contributor

eladkal commented Mar 30, 2022

@potiuk can you take a look at the test failure?
AssertionError: List of expected installed packages and image content mismatch. Check /home/runner/work/airflow/airflow/scripts/ci/installed_providers.txt file.

I don't recall that when adding a new provider we need to edit the CI script

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

@potiuk can you take a look at the test failure? AssertionError: List of expected installed packages and image content mismatch. Check /home/runner/work/airflow/airflow/scripts/ci/installed_providers.txt file.

I don't recall that when adding a new provider we need to edit the CI script

Not everything in providers has to be me :) - this test was added by @mik-laj actually: 621d17b

It looks like for some reason prodcution image produced in this build contains many more providers than it should

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

Yeah: seems that for some reason it contains all providers:

docker run -it ghcr.io/apache/airflow/main/prod/python3.7:23b7d64b40261dcdcf73187464c6f09b67afcc57  bash
Unable to find image 'ghcr.io/apache/airflow/main/prod/python3.7:23b7d64b40261dcdcf73187464c6f09b67afcc57' locally
23b7d64b40261dcdcf73187464c6f09b67afcc57: Pulling from apache/airflow/main/prod/python3.7
c229119241af: Pull complete 
5a3ae98ea812: Pull complete 
d6bab1fc351b: Pull complete 
f9cea33fb9b5: Pull complete 
23c22d6e5b5d: Pull complete 
b21b38d9bc75: Pull complete 
e52ad88eda59: Pull complete 
5938673019d8: Pull complete 
10aec20ab867: Pull complete 
bfa0b2f2703d: Pull complete 
abea59e2f689: Pull complete 
ffd9264d5a4a: Pull complete 
ea7c97498e3e: Pull complete 
4aed0971f3f7: Pull complete 
8f85ceb1d546: Pull complete 
b6132f0f6227: Pull complete 
83d18601cc4f: Pull complete 
88748a7a2d95: Pull complete 
4f4fb700ef54: Pull complete 
Digest: sha256:bf5da3a686feab47684c036de99a492c3d024fabd4e7a3b69ea9d63ce941b8c8
Status: Downloaded newer image for ghcr.io/apache/airflow/main/prod/python3.7:23b7d64b40261dcdcf73187464c6f09b67afcc57

airflow@54c94bf4e3b9:/opt/airflow$ airflow providers list
package_name                              | description                                                                                     | version
==========================================+=================================================================================================+========
apache-airflow-providers-airbyte          | Airbyte https://airbyte.io/                                                                     | 2.1.4  
apache-airflow-providers-alibaba          | Alibaba Cloud integration (including Alibaba Cloud https://www.alibabacloud.com//)              | 1.1.1  
apache-airflow-providers-amazon           | Amazon integration (including Amazon Web Services (AWS) https://aws.amazon.com/)                | 3.2.0  
apache-airflow-providers-apache-beam      | Apache Beam https://beam.apache.org/                                                            | 3.3.0  
apache-airflow-providers-apache-cassandra | Apache Cassandra http://cassandra.apache.org/                                                   | 2.1.3  
apache-airflow-providers-apache-drill     | Apache Drill https://drill.apache.org/                                                          | 1.0.4  
apache-airflow-providers-apache-druid     | Apache Druid https://druid.apache.org/                                                          | 2.3.3  
apache-airflow-providers-apache-hdfs      | Hadoop Distributed File System (HDFS) https://hadoop.apache.org/docs/r1.2.1/hdfsdesign.html     | 2.2.3  
                                          | and WebHDFS https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html |        
apache-airflow-providers-apache-hive      | Apache Hive https://hive.apache.org/                                                            | 2.3.2  
apache-airflow-providers-apache-kylin     | Apache Kylin https://kylin.apache.org/                                                          | 2.0.4  
apache-airflow-providers-apache-livy      | Apache Livy https://livy.apache.org/                                                            | 2.2.2  
apache-airflow-providers-apache-pig       | Apache Pig https://pig.apache.org/                                                              | 2.0.4  
apache-airflow-providers-apache-pinot     | Apache Pinot https://pinot.apache.org/                                                          | 2.0.4  
apache-airflow-providers-apache-spark     | Apache Spark https://spark.apache.org/                                                          | 2.1.3  
apache-airflow-providers-apache-sqoop     | Apache Sqoop https://sqoop.apache.org/                                                          | 2.1.3  
apache-airflow-providers-arangodb         | ArangoDB https://www.arangodb.com/                                                              | 1.0.0  
apache-airflow-providers-asana            | Asana https://app.asana.com/                                                                    | 1.1.3  
apache-airflow-providers-celery           | Celery http://www.celeryproject.org/                                                            | 2.1.3  
apache-airflow-providers-cloudant         | IBM Cloudant https://www.ibm.com/cloud/cloudant                                                 | 2.0.4  
apache-airflow-providers-cncf-kubernetes  | Kubernetes https://kubernetes.io/                                                               | 3.1.2  
apache-airflow-providers-databricks       | Databricks https://databricks.com/                                                              | 2.5.0  
apache-airflow-providers-datadog          | Datadog https://www.datadoghq.com/                                                              | 2.0.4  
apache-airflow-providers-dbt-cloud        | dbt Cloud https://www.getdbt.com/product/what-is-dbt/)                                          | 1.0.2  
apache-airflow-providers-dingding         | Dingding https://oapi.dingtalk.com/                                                             | 2.0.4  
apache-airflow-providers-discord          | Discord https://discordapp.com/                                                                 | 2.0.4  
apache-airflow-providers-docker           | Docker https://docs.docker.com/install/                                                         | 2.5.2  
apache-airflow-providers-elasticsearch    | Elasticsearch https://www.elastic.co/elasticsearch                                              | 3.0.2  
apache-airflow-providers-exasol           | Exasol https://docs.exasol.com/home.htm                                                         | 2.1.3  
apache-airflow-providers-facebook         | Facebook Ads http://business.facebook.com/                                                      | 2.2.3  
apache-airflow-providers-ftp              | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114                                 | 2.1.2  
apache-airflow-providers-github           | Github https://www.github.com/                                                                  | 1.0.3  
apache-airflow-providers-google           | Google services including:                                                                      | 6.7.0  
                                          |                                                                                                 |        
                                          |   - Google Ads https://ads.google.com/                                                          |        
                                          |   - Google Cloud (GCP) https://cloud.google.com/                                                |        
                                          |   - Google Firebase https://firebase.google.com/                                                |        
                                          |   - Google LevelDB https://github.com/google/leveldb/                                           |        
                                          |   - Google Marketing Platform https://marketingplatform.google.com/                             |        
                                          |   - Google Workspace https://workspace.google.pl/ (formerly Google Suite)                       |        
apache-airflow-providers-grpc             | gRPC https://grpc.io/                                                                           | 2.0.4  
apache-airflow-providers-hashicorp        | Hashicorp including Hashicorp Vault https://www.vaultproject.io/                                | 2.1.4  
apache-airflow-providers-http             | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/                                | 2.1.2  
apache-airflow-providers-imap             | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501                     | 2.2.3  
apache-airflow-providers-influxdb         | InfluxDB https://www.influxdata.com/                                                            | 1.1.3  
apache-airflow-providers-jdbc             | Java Database Connectivity (JDBC) https://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/  | 2.1.3  
apache-airflow-providers-jenkins          | Jenkins https://jenkins.io/                                                                     | 2.0.7  
apache-airflow-providers-jira             | Atlassian Jira https://www.atlassian.com/                                                       | 2.0.4  
apache-airflow-providers-microsoft-azure  | Microsoft Azure https://azure.microsoft.com/                                                    | 3.7.2  
apache-airflow-providers-microsoft-mssql  | Microsoft SQL Server (MSSQL) https://www.microsoft.com/en-us/sql-server/sql-server-downloads    | 2.1.3  
apache-airflow-providers-microsoft-psrp   | This package provides remote execution capabilities via the                                     | 1.1.3  
                                          | PowerShell Remoting Protocol (PSRP)                                                             |        
                                          | https://docs.microsoft.com/en-us/openspecs/windowsprotocols/ms-psrp/                            |        
apache-airflow-providers-microsoft-winrm  | Windows Remote Management (WinRM) https://docs.microsoft.com/en-us/windows/win32/winrm/portal   | 2.0.5  
apache-airflow-providers-mongo            | MongoDB https://www.mongodb.com/what-is-mongodb                                                 | 2.3.3  
apache-airflow-providers-mysql            | MySQL https://www.mysql.com/products/                                                           | 2.2.3  
apache-airflow-providers-neo4j            | Neo4j https://neo4j.com/                                                                        | 2.1.3  
apache-airflow-providers-odbc             | ODBC https://github.com/mkleehammer/pyodbc/wiki                                                 | 2.0.4  
apache-airflow-providers-openfaas         | OpenFaaS https://www.openfaas.com/                                                              | 2.0.3  
apache-airflow-providers-opsgenie         | Opsgenie https://www.opsgenie.com/                                                              | 3.0.3  
apache-airflow-providers-oracle           | Oracle https://www.oracle.com/en/database/                                                      | 2.2.3  
apache-airflow-providers-pagerduty        | Pagerduty https://www.pagerduty.com/                                                            | 2.1.3  
apache-airflow-providers-papermill        | Papermill https://github.com/nteract/papermill                                                  | 2.2.3  
apache-airflow-providers-plexus           | Plexus https://plexus.corescientific.com/                                                       | 2.0.4  
apache-airflow-providers-postgres         | PostgreSQL https://www.postgresql.org/                                                          | 4.1.0  
apache-airflow-providers-presto           | Presto https://prestodb.github.io/                                                              | 2.1.2  
apache-airflow-providers-qubole           | Qubole https://www.qubole.com/                                                                  | 2.1.3  
apache-airflow-providers-redis            | Redis https://redis.io/                                                                         | 2.0.4  
apache-airflow-providers-salesforce       | Salesforce https://www.salesforce.com/                                                          | 3.4.3  
apache-airflow-providers-samba            | Samba https://www.samba.org/                                                                    | 3.0.4  
apache-airflow-providers-segment          | Segment https://segment.com/                                                                    | 2.0.4  
apache-airflow-providers-sendgrid         | Sendgrid https://sendgrid.com/                                                                  | 2.0.4  
apache-airflow-providers-sftp             | SSH File Transfer Protocol (SFTP) https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/    | 2.5.2  
apache-airflow-providers-singularity      | Singularity https://sylabs.io/guides/latest/user-guide/                                         | 2.0.4  
apache-airflow-providers-slack            | Slack https://slack.com/                                                                        | 4.2.3  
apache-airflow-providers-snowflake        | Snowflake https://www.snowflake.com/                                                            | 2.6.0  
apache-airflow-providers-sqlite           | SQLite https://www.sqlite.org/                                                                  | 2.1.3  
apache-airflow-providers-ssh              | Secure Shell (SSH) https://tools.ietf.org/html/rfc4251                                          | 2.4.3  
apache-airflow-providers-tableau          | Tableau https://www.tableau.com/                                                                | 2.1.7  
apache-airflow-providers-telegram         | Telegram https://telegram.org/                                                                  | 2.0.4  
apache-airflow-providers-trino            | Trino https://trino.io/                                                                         | 2.1.2  
apache-airflow-providers-vertica          | Vertica https://www.vertica.com/                                                                | 2.1.3  
apache-airflow-providers-yandex           | Yandex including Yandex.Cloud https://cloud.yandex.com/                                         | 2.2.3  
apache-airflow-providers-zendesk          | Zendesk https://www.zendesk.com/                                                                | 3.0.3  

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

This is VERY strange as it seems that when the image was built, it actually used only a small subset (as expected):

#64 1.486 Force re-installing airflow and providers from local files with eager upgrade
#64 1.486 
#64 2.925 Looking in links: file:///docker-context-files
#64 2.937 Processing /docker-context-files/apache_airflow_providers_amazon-3.2.0.dev0-py3-none-any.whl
#64 2.952 Processing /docker-context-files/apache_airflow_providers_celery-2.1.3.dev0-py3-none-any.whl
#64 2.959 Processing /docker-context-files/apache_airflow_providers_cncf_kubernetes-3.1.2.dev0-py3-none-any.whl
#64 2.966 Processing /docker-context-files/apache_airflow_providers_docker-2.5.2.dev0-py3-none-any.whl
#64 2.973 Processing /docker-context-files/apache_airflow_providers_elasticsearch-3.0.2.dev0-py3-none-any.whl
#64 2.980 Processing /docker-context-files/apache_airflow_providers_ftp-2.1.2.dev0-py3-none-any.whl
#64 2.988 Processing /docker-context-files/apache_airflow_providers_google-6.7.0.dev0-py3-none-any.whl
#64 2.997 Processing /docker-context-files/apache_airflow_providers_grpc-2.0.4.dev0-py3-none-any.whl
#64 3.004 Processing /docker-context-files/apache_airflow_providers_hashicorp-2.1.4.dev0-py3-none-any.whl
#64 3.011 Processing /docker-context-files/apache_airflow_providers_http-2.1.2.dev0-py3-none-any.whl
#64 3.018 Processing /docker-context-files/apache_airflow_providers_imap-2.2.3.dev0-py3-none-any.whl
#64 3.026 Processing /docker-context-files/apache_airflow_providers_microsoft_azure-3.7.2.dev0-py3-none-any.whl
#64 3.033 Processing /docker-context-files/apache_airflow_providers_mysql-2.2.3.dev0-py3-none-any.whl
#64 3.040 Processing /docker-context-files/apache_airflow_providers_odbc-2.0.4.dev0-py3-none-any.whl
#64 3.047 Processing /docker-context-files/apache_airflow_providers_postgres-4.1.0.dev0-py3-none-any.whl
#64 3.054 Processing /docker-context-files/apache_airflow_providers_redis-2.0.4.dev0-py3-none-any.whl
#64 3.062 Processing /docker-context-files/apache_airflow_providers_sendgrid-2.0.4.dev0-py3-none-any.whl
#64 3.069 Processing /docker-context-files/apache_airflow_providers_sftp-2.5.2.dev0-py3-none-any.whl
#64 3.076 Processing /docker-context-files/apache_airflow_providers_slack-4.2.3.dev0-py3-none-any.whl
#64 3.083 Processing /docker-context-files/apache_airflow_providers_sqlite-2.1.3.dev0-py3-none-any.whl
#64 3.090 Processing /docker-context-files/apache_airflow_providers_ssh-2.4.3.dev0-py3-none-any.whl
#64 3.223 Processing /docker-context-files/apache_airflow-2.3.0.dev0-py3-none-any.whl

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

Let me rebase and see it happening again :)

@eladkal
Copy link
Contributor

eladkal commented Mar 30, 2022

I don't recall we had such issue when GitHub provider was added (and it was after 621d17b )

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

I don't recall we had such issue when GitHub provider was added (and it was after 621d17b )

Me neither. It basicallly SHOUD NOT happen :D. Yet it seems it did again

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

OK. I know what causes it but I do not know why it happens yet. When PROD build image is prepared we prepare "airflow" package so that it can be installed there from latest sources. But for SOME reason, it contains "all" providers as well. not only airflow. I do not know where it came from yet. But It proves the tests from @mik-laj are useful to catch it.

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

I actually think it could come from the new setuptools release https://pypi.org/project/setuptools/61.2.0/

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

Still puzzled :) but I am getting closer to solve it

@pateash
Copy link
Contributor Author

pateash commented Mar 30, 2022

thanks @potiuk.

@potiuk
Copy link
Member

potiuk commented Mar 31, 2022

Rebased it @pateash -> I have high hopes for #22649 to either fix it or make it easier to understand where it came from

@potiuk
Copy link
Member

potiuk commented Apr 1, 2022

Hi maintainer of python-arango here. I've removed the dependency. Please try again with release version 7.3.2. Thanks.

Cool. Thanks! @pateash -> can you add >=7.3.2 to our requirements please ?

@pateash
Copy link
Contributor Author

pateash commented Apr 1, 2022

voila 🥳,
It worked.
Thanks @joowani

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) - @eladkal ?

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Apr 1, 2022
@github-actions
Copy link

github-actions bot commented Apr 1, 2022

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@eladkal
Copy link
Contributor

eladkal commented Apr 1, 2022

I'll take a look later today

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -256,6 +256,8 @@ Those are extras that add dependencies needed for integration with other softwar
+---------------------+-----------------------------------------------------+-------------------------------------------+
| trino | ``pip install 'apache-airflow[trino]'`` | All Trino related operators & hooks |
+---------------------+-----------------------------------------------------+-------------------------------------------+
| arangodb | ``pip install 'apache-airflow[arangodb]'`` | ArangoDB operators, sensors and hook |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this list is sorted alphabetically?

@eladkal eladkal merged commit c758c76 into apache:main Apr 3, 2022
@potiuk
Copy link
Member

potiuk commented Apr 3, 2022

🎉 🎉 🎉 🎉 🎉 🎉 🎉

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 11, 2022
@pateash pateash deleted the airflow-17778 branch May 19, 2022 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Arango hook
5 participants