-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into NP-47308-fix-role-name
# Conflicts: # template.yaml
- Loading branch information
Showing
108 changed files
with
1,187 additions
and
2,653 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,78 +1,22 @@ | ||
# NVA Data Report API | ||
|
||
This repository contains the NVA data report API. | ||
This repository contains functions for generating csv reports of data from NVA. | ||
See [reportTypes](documentation/reportTypes.md) for a list of reports and data types. | ||
|
||
## How to run a bulk upload | ||
## Architectural overview | ||
|
||
The steps below can be outlined briefly as: | ||
![Architecture](documentation/images/data_export_overview.png) | ||
|
||
- Pre-run | ||
- Stop incoming live-update events | ||
- Delete data from previous runs | ||
- Delete all data in database | ||
- Bulk upload | ||
- Generate batches of document keys for upload | ||
- Transform the data to a format compatible with the bulk-upload action | ||
- Initiate bulk upload | ||
- Verify data integrity | ||
- Post-run | ||
- Start incoming live-update events | ||
## Integration overview | ||
|
||
### Pre-run steps | ||
The s3 bucket `data-report-csv-export-{accountName}` (defined in template) is | ||
set up as a data source in Databricks (in another AWS account) following | ||
databricks [guide _Create a storage credential for connecting to AWS S3_](https://docs.databricks.com/en/connect/unity-catalog/storage-credentials.html#create-a-storage-credential-for-connecting-to-aws-s3). | ||
This is how the data platform accesses files from | ||
`data-report-csv-export-{accountName}`: | ||
|
||
1. Remove all objects from S3 bucket `loader-input-files-{accountName}` | ||
2. Turn off S3 event notifications for bucket `persisted-resources-{accountName}` | ||
In aws console, go | ||
to | ||
<br>_S3_ -> _persisted-resources-{accountName}_ -> _Properties_ -> _Amazon EventBridge_ -> | ||
_Edit_ -> _Off_ | ||
3. Press `ResetDatabaseButton` (Trigger `DatabaseResetHandler`). This might take around a minute to | ||
complete. | ||
4. Verify that database is empty. You can use SageMaker notebook to query the database*. Example | ||
sparql queries: | ||
``` | ||
SELECT (COUNT(DISTINCT ?g) as ?gCount) WHERE {GRAPH ?g {?s ?p ?o}} | ||
``` | ||
or | ||
``` | ||
SELECT ?g ?s ?p ?o WHERE {GRAPH ?g {?s ?p ?o}} LIMIT 100 | ||
``` | ||
![Databricks integration](documentation/images/data_report_aws_databricks_storage_credential.png) | ||
|
||
### Bulk upload steps | ||
## How-to guides | ||
|
||
1. Generate key batches for both locations: `resources` and `nvi-candidates`. Manually trigger | ||
`GenerateKeyBatchesHandler` with the following input: | ||
```json | ||
{ | ||
"detail": { | ||
"location": "resources|nvi-candidates" | ||
} | ||
} | ||
``` | ||
2. Verify that `GenerateKeyBatchesHandler` is done processing (i.e. check logs) and that key batches | ||
have been generated S3 bucket `data-report-key-batches-{accountName}` | ||
3. Trigger `BulkTransformerHandler` | ||
4. Verify that `BulkTransformerHandler` is done processing (i.e. check logs) and that nquads | ||
have been generated S3 bucket `loader-input-files-{accountName}` | ||
5. Trigger `BulkDataLoader` | ||
6. To check progress for bulk upload to Neptune. Trigger `BulkDataLoader` with the following input: | ||
```json | ||
{ | ||
"loadId": "{copy loadId UUID from test log}" | ||
} | ||
``` | ||
7. Verify that expected count is in database. Query for counting distinct named graphs: | ||
``` | ||
SELECT (COUNT(DISTINCT ?g) as ?gCount) WHERE {GRAPH ?g {?s ?p ?o}} | ||
``` | ||
### Post-run steps | ||
1. Turn on S3 event notifications for bucket `persisted-resources-{accountName}`. | ||
In aws console, go | ||
to | ||
<br> _S3_ -> _persisted-resources-{accountName}_ -> _Properties_ -> _Amazon EventBridge_ -> | ||
_Edit_ -> _On_ | ||
*Note: You can use SageMaker notebook to query the database. Notebook can be opened from the AWS | ||
console through _SageMaker_ -> _Notebooks_ -> _Notebook instances_ -> _Open JupyterLab_ | ||
- [Run bulk export](documentation/bulkExport.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...ransformer/GenerateKeyBatchesHandler.java → ...api/export/GenerateKeyBatchesHandler.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...etl/transformer/KeyBatchRequestEvent.java → ...port/api/export/KeyBatchRequestEvent.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.