Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Athena to Amazon Neptune connector code #222

Merged
merged 36 commits into from
Oct 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
6303507
fixed issue with span function for null values
abhishekpradeepmishra Jul 1, 2020
b68dcc3
Merge pull request #1 from abhishekpradeepmishra/dev
abhishekpradeepmishra Jul 1, 2020
67fae55
Added code for Athena to Neptune connector
abhishekpradeepmishra Jul 1, 2020
c351e44
added code to handle =, >=, <=
abhishekpradeepmishra Jul 1, 2020
cfd5c1c
Fixed unit test cases
abhishekpradeepmishra Jul 3, 2020
e19fe66
added test base class, cleaned up NeptuneRecordHandler
abhishekpradeepmishra Jul 3, 2020
4efbc44
Inherited from Testbase
abhishekpradeepmishra Jul 3, 2020
8eaf751
refactored code to include TestBase and removed hardcoding of Neptune…
abhishekpradeepmishra Jul 7, 2020
9520d2d
Corrected Readme and athena-neptune.yaml and deleted UDF classes
Jul 9, 2020
b6c3277
Refactored Athena Neptune Handler
abhishekpradeepmishra Jul 9, 2020
b1c39a3
refactored code, mocked test cases for MetaHandler
abhishekpradeepmishra Jul 28, 2020
7b87f27
implemented code review comments
abhishekpradeepmishra Aug 12, 2020
f8c184a
removed unwanted files
abhishekpradeepmishra Aug 12, 2020
a246be6
mocked record handler code, removed string transformation in Gremlin …
abhishekpradeepmishra Aug 19, 2020
7619577
removed string templates
abhishekpradeepmishra Aug 19, 2020
23fe7bf
Added eclipse prefs to git ignore list
Sep 16, 2020
a810b84
implemented RowWriter pattern, code clean-up and refactoring pending
abhishekpradeepmishra Sep 21, 2020
9ef2e2a
added more contraints to RecordHanderTest, refactored code, removed u…
abhishekpradeepmishra Sep 22, 2020
e30ee4f
refactored code
abhishekpradeepmishra Sep 22, 2020
60712ab
1Modified SAM yaml to separate DB permissions
Sep 23, 2020
f4a9cd0
1. Added vscode settings to ignore
Sep 23, 2020
74b2cb6
Added packaged yaml to ignore list
Sep 24, 2020
093c6ad
Add S3 list all buckets and simplified neptune ARN
Sep 24, 2020
0e408f5
Merge branch 'master' of https://github.com/awslabs/aws-athena-query-…
Sep 24, 2020
9369a6f
Modifications align with latest FederatedIdentity
Sep 24, 2020
e4d654f
code review changes
abhishekpradeepmishra Oct 6, 2020
ca4b7d7
Draft version of Readme file
Oct 12, 2020
d0cfa3e
Merge branch 'athena-neptune' of https://github.com/abhishekpradeepmi…
Oct 12, 2020
33be79b
code review related changes
abhishekpradeepmishra Oct 12, 2020
73998cb
Merge remote-tracking branch 'upstream/master' into athena-neptune
Oct 12, 2020
56f4fb9
Updated data types in readme and updated gitignore
Oct 14, 2020
e98adb3
Merge remote-tracking branch 'upstream/master' into athena-neptune
Oct 14, 2020
7fc923c
added support for more data types
abhishekpradeepmishra Oct 14, 2020
0de38d3
Fixed import order formatting issue.
Oct 14, 2020
8ff2c50
added test cases constraints for Boolean and BigInt, fixed issue wit…
abhishekpradeepmishra Oct 14, 2020
05cf0c7
committing chagnes for test cases
abhishekpradeepmishra Oct 14, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,12 @@
.idea/
/target/
*/*.iml
.classpath
.factorypath
.project
*/.settings/
*/.DS_Store
.settings/org.eclipse.m2e.core.prefs
.vscode/settings.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not typically commit the .gitignore file to our project.

*/packaged.yaml
.DS_Store
174 changes: 174 additions & 0 deletions athena-neptune/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

1. Definitions.

"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.

"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.

2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.

3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.

4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:

(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and

(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and

(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and

(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.

You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.

5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.

6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.

7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
70 changes: 70 additions & 0 deletions athena-neptune/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Amazon Athena Neptune Connector

This connector enables Amazon Athena to communicate with your Neptune Graph Database instance, making your Neptune graph data accessible via SQL.

**To enable this Preview feature you need to create an Athena workgroup named AmazonAthenaPreviewFunctionality and run any queries attempting to federate to this connector, use a UDF, or SageMaker inference from that workgroup.**

Unlike traditional relational data stores, Neptune graph DB nodes and edges do not have set schema. Each entry can have different fields and data types. While we are investigating the best way to support schema-on-read usecases for this connector, it presently supports retrieving meta-data from the Glue Data Catalog. You need to pre-create the Glue Database and the corresponding Glue tables with required schemas within that database. This allows the connector to populate list of tables available to query within Athena.

> **NOTE**
>
> Create the Glue database and the corresponding tables within the same AWS Region as your Neptune cluster and where you intend to run this connector Lambda funciton.

Each graph node type is represented as a glue table and node properties are represented as glue table properties with the corresponding datatypes associated with them.

Here's a reference of the Glue DataTypes that you can use:

|Glue DataType|Apache Arrow Type|
|-------------|-----------------|
|int|INT|
|bigint|BIGINT|
|double|FLOAT8|
|float|FLOAT4|
|boolean|BIT|
|binary|VARBINARY|
|string|VARCHAR|

<br/>

### Parameters

The Amazon Athena Neptune Connector exposes several configuration options via Lambda environment variables. More detail on the available parameters can be found below.

1. **neptune_endpoint** - The Neptune Cluster Endpoint
2. **neptune_port** - (Optional) The Neptune Cluster Endpoint port to communicate with. Defaults to 8182
3. **neptune_cluster_res_id** - The Neptune Cluster ResourceID is required to restrict access to specific cluster from the Lambda function within IAM Permissions. To find the Neptune cluster resource ID in the Amazon Neptune AWS Management Console, choose the DB cluster that you want. The Resource ID is shown in the Configuration section.
4. **glue_database_name** - Name of the Glue database that you pre-created.
5. **spill_bucket** - When the data returned by your Lambda function exceeds Lambda’s limits, this is the bucket that the data will be written to for Athena to read the excess from. (e.g. my_bucket)
6. **spill_prefix** - (Optional) Defaults to 'athena-neptune-spill'. Used in conjunction with spill_bucket, this is the path within the above bucket that large responses are spilled to. *You should configure an S3 lifecycle on this location to delete old spills after X days/Hours.*
7. **disable_spill_encryption** - (Optional) Defaults to False so that any data that is spilled to S3 is encrypted using AES-GMC either with a randomly generated key or using KMS to generate keys. Setting this to false will disable spill encryption. You may wish to disable this for improved performance, especially if your spill location in S3 uses S3 Server Side Encryption. (e.g. true or false)

<br/>

### Required Permissions

Review the "Policies" section of the athena-neptune.yaml file for full details on the IAM Policies required by this connector. A brief summary is below.

1. S3 Write Access - In order to successfully handle large queries, the connector requires write access to a location in S3.
2. Glue Database - Since Neptune does not have a meta-data store, the connector requires Read-Only access to Glue's Database and tables for table schema information.
4. VPC Access - In order to connect to your VPC for the purposes of communicating with your Neptune cluster, the connector needs the ability to attach/detach an interface to the VPC.
5. CloudWatch Logs - This is a somewhat implicit permission when deploying a Lambda function but it needs access to cloudwatch logs for storing logs.
6. Athena GetQueryExecution - The connector uses this access to fast-fail when the upstream Athena query has terminated.
7. Neptune DB - This is to allow access to a specific Neptune cluster based on the provided cluster resource ID.

### Deploying The Connector

To use this connector in your queries, navigate to AWS Serverless Application Repository and deploy a pre-built version of this connector. Alternatively, you can build and deploy this connector from source follow the below steps or use the more detailed tutorial in the athena-example module:

1. From the athena-federation-sdk dir, run `mvn clean install` if you haven't already.
2. From the athena-neptune dir, run `mvn clean install`.
3. From the athena-neptune dir, run `../tools/publish.sh S3_BUCKET_NAME athena-neptune` to publish the connector to your private AWS Serverless Application Repository. The S3_BUCKET in the command is where a copy of the connector's code will be stored for Serverless Application Repository to retrieve it. This will allow users with permission to do so, the ability to deploy instances of the connector via 1-Click form. Then navigate to [Serverless Application Repository](https://aws.amazon.com/serverless/serverlessrepo)


## Current Limitations

Here are some of the current limitations of this connector:

1. The connector currently supports only Property Graph model and does not support RDF Graphs yet.
2. The connector does not support the full graph traversal capabilities that Apache TinkerPop Gremlin supports.


Loading