-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Athena to Amazon Neptune connector code #222
Merged
avirtuos
merged 36 commits into
awslabs:master
from
abhishekpradeepmishra:athena-neptune
Oct 15, 2020
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
6303507
fixed issue with span function for null values
abhishekpradeepmishra b68dcc3
Merge pull request #1 from abhishekpradeepmishra/dev
abhishekpradeepmishra 67fae55
Added code for Athena to Neptune connector
abhishekpradeepmishra c351e44
added code to handle =, >=, <=
abhishekpradeepmishra cfd5c1c
Fixed unit test cases
abhishekpradeepmishra e19fe66
added test base class, cleaned up NeptuneRecordHandler
abhishekpradeepmishra 4efbc44
Inherited from Testbase
abhishekpradeepmishra 8eaf751
refactored code to include TestBase and removed hardcoding of Neptune…
abhishekpradeepmishra 9520d2d
Corrected Readme and athena-neptune.yaml and deleted UDF classes
b6c3277
Refactored Athena Neptune Handler
abhishekpradeepmishra b1c39a3
refactored code, mocked test cases for MetaHandler
abhishekpradeepmishra 7b87f27
implemented code review comments
abhishekpradeepmishra f8c184a
removed unwanted files
abhishekpradeepmishra a246be6
mocked record handler code, removed string transformation in Gremlin …
abhishekpradeepmishra 7619577
removed string templates
abhishekpradeepmishra 23fe7bf
Added eclipse prefs to git ignore list
a810b84
implemented RowWriter pattern, code clean-up and refactoring pending
abhishekpradeepmishra 9ef2e2a
added more contraints to RecordHanderTest, refactored code, removed u…
abhishekpradeepmishra e30ee4f
refactored code
abhishekpradeepmishra 60712ab
1Modified SAM yaml to separate DB permissions
f4a9cd0
1. Added vscode settings to ignore
74b2cb6
Added packaged yaml to ignore list
093c6ad
Add S3 list all buckets and simplified neptune ARN
0e408f5
Merge branch 'master' of https://github.com/awslabs/aws-athena-query-…
9369a6f
Modifications align with latest FederatedIdentity
e4d654f
code review changes
abhishekpradeepmishra ca4b7d7
Draft version of Readme file
d0cfa3e
Merge branch 'athena-neptune' of https://github.com/abhishekpradeepmi…
33be79b
code review related changes
abhishekpradeepmishra 73998cb
Merge remote-tracking branch 'upstream/master' into athena-neptune
56f4fb9
Updated data types in readme and updated gitignore
e98adb3
Merge remote-tracking branch 'upstream/master' into athena-neptune
7fc923c
added support for more data types
abhishekpradeepmishra 0de38d3
Fixed import order formatting issue.
8ff2c50
added test cases constraints for Boolean and BigInt, fixed issue wit…
abhishekpradeepmishra 05cf0c7
committing chagnes for test cases
abhishekpradeepmishra File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
Apache License | ||
Version 2.0, January 2004 | ||
http://www.apache.org/licenses/ | ||
|
||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION | ||
|
||
1. Definitions. | ||
|
||
"License" shall mean the terms and conditions for use, reproduction, | ||
and distribution as defined by Sections 1 through 9 of this document. | ||
|
||
"Licensor" shall mean the copyright owner or entity authorized by | ||
the copyright owner that is granting the License. | ||
|
||
"Legal Entity" shall mean the union of the acting entity and all | ||
other entities that control, are controlled by, or are under common | ||
control with that entity. For the purposes of this definition, | ||
"control" means (i) the power, direct or indirect, to cause the | ||
direction or management of such entity, whether by contract or | ||
otherwise, or (ii) ownership of fifty percent (50%) or more of the | ||
outstanding shares, or (iii) beneficial ownership of such entity. | ||
|
||
"You" (or "Your") shall mean an individual or Legal Entity | ||
exercising permissions granted by this License. | ||
|
||
"Source" form shall mean the preferred form for making modifications, | ||
including but not limited to software source code, documentation | ||
source, and configuration files. | ||
|
||
"Object" form shall mean any form resulting from mechanical | ||
transformation or translation of a Source form, including but | ||
not limited to compiled object code, generated documentation, | ||
and conversions to other media types. | ||
|
||
"Work" shall mean the work of authorship, whether in Source or | ||
Object form, made available under the License, as indicated by a | ||
copyright notice that is included in or attached to the work | ||
(an example is provided in the Appendix below). | ||
|
||
"Derivative Works" shall mean any work, whether in Source or Object | ||
form, that is based on (or derived from) the Work and for which the | ||
editorial revisions, annotations, elaborations, or other modifications | ||
represent, as a whole, an original work of authorship. For the purposes | ||
of this License, Derivative Works shall not include works that remain | ||
separable from, or merely link (or bind by name) to the interfaces of, | ||
the Work and Derivative Works thereof. | ||
|
||
"Contribution" shall mean any work of authorship, including | ||
the original version of the Work and any modifications or additions | ||
to that Work or Derivative Works thereof, that is intentionally | ||
submitted to Licensor for inclusion in the Work by the copyright owner | ||
or by an individual or Legal Entity authorized to submit on behalf of | ||
the copyright owner. For the purposes of this definition, "submitted" | ||
means any form of electronic, verbal, or written communication sent | ||
to the Licensor or its representatives, including but not limited to | ||
communication on electronic mailing lists, source code control systems, | ||
and issue tracking systems that are managed by, or on behalf of, the | ||
Licensor for the purpose of discussing and improving the Work, but | ||
excluding communication that is conspicuously marked or otherwise | ||
designated in writing by the copyright owner as "Not a Contribution." | ||
|
||
"Contributor" shall mean Licensor and any individual or Legal Entity | ||
on behalf of whom a Contribution has been received by Licensor and | ||
subsequently incorporated within the Work. | ||
|
||
2. Grant of Copyright License. Subject to the terms and conditions of | ||
this License, each Contributor hereby grants to You a perpetual, | ||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable | ||
copyright license to reproduce, prepare Derivative Works of, | ||
publicly display, publicly perform, sublicense, and distribute the | ||
Work and such Derivative Works in Source or Object form. | ||
|
||
3. Grant of Patent License. Subject to the terms and conditions of | ||
this License, each Contributor hereby grants to You a perpetual, | ||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable | ||
(except as stated in this section) patent license to make, have made, | ||
use, offer to sell, sell, import, and otherwise transfer the Work, | ||
where such license applies only to those patent claims licensable | ||
by such Contributor that are necessarily infringed by their | ||
Contribution(s) alone or by combination of their Contribution(s) | ||
with the Work to which such Contribution(s) was submitted. If You | ||
institute patent litigation against any entity (including a | ||
cross-claim or counterclaim in a lawsuit) alleging that the Work | ||
or a Contribution incorporated within the Work constitutes direct | ||
or contributory patent infringement, then any patent licenses | ||
granted to You under this License for that Work shall terminate | ||
as of the date such litigation is filed. | ||
|
||
4. Redistribution. You may reproduce and distribute copies of the | ||
Work or Derivative Works thereof in any medium, with or without | ||
modifications, and in Source or Object form, provided that You | ||
meet the following conditions: | ||
|
||
(a) You must give any other recipients of the Work or | ||
Derivative Works a copy of this License; and | ||
|
||
(b) You must cause any modified files to carry prominent notices | ||
stating that You changed the files; and | ||
|
||
(c) You must retain, in the Source form of any Derivative Works | ||
that You distribute, all copyright, patent, trademark, and | ||
attribution notices from the Source form of the Work, | ||
excluding those notices that do not pertain to any part of | ||
the Derivative Works; and | ||
|
||
(d) If the Work includes a "NOTICE" text file as part of its | ||
distribution, then any Derivative Works that You distribute must | ||
include a readable copy of the attribution notices contained | ||
within such NOTICE file, excluding those notices that do not | ||
pertain to any part of the Derivative Works, in at least one | ||
of the following places: within a NOTICE text file distributed | ||
as part of the Derivative Works; within the Source form or | ||
documentation, if provided along with the Derivative Works; or, | ||
within a display generated by the Derivative Works, if and | ||
wherever such third-party notices normally appear. The contents | ||
of the NOTICE file are for informational purposes only and | ||
do not modify the License. You may add Your own attribution | ||
notices within Derivative Works that You distribute, alongside | ||
or as an addendum to the NOTICE text from the Work, provided | ||
that such additional attribution notices cannot be construed | ||
as modifying the License. | ||
|
||
You may add Your own copyright statement to Your modifications and | ||
may provide additional or different license terms and conditions | ||
for use, reproduction, or distribution of Your modifications, or | ||
for any such Derivative Works as a whole, provided Your use, | ||
reproduction, and distribution of the Work otherwise complies with | ||
the conditions stated in this License. | ||
|
||
5. Submission of Contributions. Unless You explicitly state otherwise, | ||
any Contribution intentionally submitted for inclusion in the Work | ||
by You to the Licensor shall be under the terms and conditions of | ||
this License, without any additional terms or conditions. | ||
Notwithstanding the above, nothing herein shall supersede or modify | ||
the terms of any separate license agreement you may have executed | ||
with Licensor regarding such Contributions. | ||
|
||
6. Trademarks. This License does not grant permission to use the trade | ||
names, trademarks, service marks, or product names of the Licensor, | ||
except as required for reasonable and customary use in describing the | ||
origin of the Work and reproducing the content of the NOTICE file. | ||
|
||
7. Disclaimer of Warranty. Unless required by applicable law or | ||
agreed to in writing, Licensor provides the Work (and each | ||
Contributor provides its Contributions) on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
implied, including, without limitation, any warranties or conditions | ||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A | ||
PARTICULAR PURPOSE. You are solely responsible for determining the | ||
appropriateness of using or redistributing the Work and assume any | ||
risks associated with Your exercise of permissions under this License. | ||
|
||
8. Limitation of Liability. In no event and under no legal theory, | ||
whether in tort (including negligence), contract, or otherwise, | ||
unless required by applicable law (such as deliberate and grossly | ||
negligent acts) or agreed to in writing, shall any Contributor be | ||
liable to You for damages, including any direct, indirect, special, | ||
incidental, or consequential damages of any character arising as a | ||
result of this License or out of the use or inability to use the | ||
Work (including but not limited to damages for loss of goodwill, | ||
work stoppage, computer failure or malfunction, or any and all | ||
other commercial damages or losses), even if such Contributor | ||
has been advised of the possibility of such damages. | ||
|
||
9. Accepting Warranty or Additional Liability. While redistributing | ||
the Work or Derivative Works thereof, You may choose to offer, | ||
and charge a fee for, acceptance of support, warranty, indemnity, | ||
or other liability obligations and/or rights consistent with this | ||
License. However, in accepting such obligations, You may act only | ||
on Your own behalf and on Your sole responsibility, not on behalf | ||
of any other Contributor, and only if You agree to indemnify, | ||
defend, and hold each Contributor harmless for any liability | ||
incurred by, or claims asserted against, such Contributor by reason | ||
of your accepting any such warranty or additional liability. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Amazon Athena Neptune Connector | ||
|
||
This connector enables Amazon Athena to communicate with your Neptune Graph Database instance, making your Neptune graph data accessible via SQL. | ||
|
||
**To enable this Preview feature you need to create an Athena workgroup named AmazonAthenaPreviewFunctionality and run any queries attempting to federate to this connector, use a UDF, or SageMaker inference from that workgroup.** | ||
|
||
Unlike traditional relational data stores, Neptune graph DB nodes and edges do not have set schema. Each entry can have different fields and data types. While we are investigating the best way to support schema-on-read usecases for this connector, it presently supports retrieving meta-data from the Glue Data Catalog. You need to pre-create the Glue Database and the corresponding Glue tables with required schemas within that database. This allows the connector to populate list of tables available to query within Athena. | ||
|
||
> **NOTE** | ||
> | ||
> Create the Glue database and the corresponding tables within the same AWS Region as your Neptune cluster and where you intend to run this connector Lambda funciton. | ||
|
||
Each graph node type is represented as a glue table and node properties are represented as glue table properties with the corresponding datatypes associated with them. | ||
|
||
Here's a reference of the Glue DataTypes that you can use: | ||
|
||
|Glue DataType|Apache Arrow Type| | ||
|-------------|-----------------| | ||
|int|INT| | ||
|bigint|BIGINT| | ||
|double|FLOAT8| | ||
|float|FLOAT4| | ||
|boolean|BIT| | ||
|binary|VARBINARY| | ||
|string|VARCHAR| | ||
|
||
<br/> | ||
|
||
### Parameters | ||
|
||
The Amazon Athena Neptune Connector exposes several configuration options via Lambda environment variables. More detail on the available parameters can be found below. | ||
|
||
1. **neptune_endpoint** - The Neptune Cluster Endpoint | ||
2. **neptune_port** - (Optional) The Neptune Cluster Endpoint port to communicate with. Defaults to 8182 | ||
3. **neptune_cluster_res_id** - The Neptune Cluster ResourceID is required to restrict access to specific cluster from the Lambda function within IAM Permissions. To find the Neptune cluster resource ID in the Amazon Neptune AWS Management Console, choose the DB cluster that you want. The Resource ID is shown in the Configuration section. | ||
4. **glue_database_name** - Name of the Glue database that you pre-created. | ||
5. **spill_bucket** - When the data returned by your Lambda function exceeds Lambda’s limits, this is the bucket that the data will be written to for Athena to read the excess from. (e.g. my_bucket) | ||
6. **spill_prefix** - (Optional) Defaults to 'athena-neptune-spill'. Used in conjunction with spill_bucket, this is the path within the above bucket that large responses are spilled to. *You should configure an S3 lifecycle on this location to delete old spills after X days/Hours.* | ||
7. **disable_spill_encryption** - (Optional) Defaults to False so that any data that is spilled to S3 is encrypted using AES-GMC either with a randomly generated key or using KMS to generate keys. Setting this to false will disable spill encryption. You may wish to disable this for improved performance, especially if your spill location in S3 uses S3 Server Side Encryption. (e.g. true or false) | ||
|
||
<br/> | ||
|
||
### Required Permissions | ||
|
||
Review the "Policies" section of the athena-neptune.yaml file for full details on the IAM Policies required by this connector. A brief summary is below. | ||
|
||
1. S3 Write Access - In order to successfully handle large queries, the connector requires write access to a location in S3. | ||
2. Glue Database - Since Neptune does not have a meta-data store, the connector requires Read-Only access to Glue's Database and tables for table schema information. | ||
4. VPC Access - In order to connect to your VPC for the purposes of communicating with your Neptune cluster, the connector needs the ability to attach/detach an interface to the VPC. | ||
5. CloudWatch Logs - This is a somewhat implicit permission when deploying a Lambda function but it needs access to cloudwatch logs for storing logs. | ||
6. Athena GetQueryExecution - The connector uses this access to fast-fail when the upstream Athena query has terminated. | ||
7. Neptune DB - This is to allow access to a specific Neptune cluster based on the provided cluster resource ID. | ||
|
||
### Deploying The Connector | ||
|
||
To use this connector in your queries, navigate to AWS Serverless Application Repository and deploy a pre-built version of this connector. Alternatively, you can build and deploy this connector from source follow the below steps or use the more detailed tutorial in the athena-example module: | ||
|
||
1. From the athena-federation-sdk dir, run `mvn clean install` if you haven't already. | ||
2. From the athena-neptune dir, run `mvn clean install`. | ||
3. From the athena-neptune dir, run `../tools/publish.sh S3_BUCKET_NAME athena-neptune` to publish the connector to your private AWS Serverless Application Repository. The S3_BUCKET in the command is where a copy of the connector's code will be stored for Serverless Application Repository to retrieve it. This will allow users with permission to do so, the ability to deploy instances of the connector via 1-Click form. Then navigate to [Serverless Application Repository](https://aws.amazon.com/serverless/serverlessrepo) | ||
|
||
|
||
## Current Limitations | ||
|
||
Here are some of the current limitations of this connector: | ||
|
||
1. The connector currently supports only Property Graph model and does not support RDF Graphs yet. | ||
2. The connector does not support the full graph traversal capabilities that Apache TinkerPop Gremlin supports. | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not typically commit the .gitignore file to our project.