Note that this repo is now superseded. The production RAiD app can be found at: https://github.com/au-research/raido
Please create a discussion on that repo, or email contact@raid.org if you have any questions.
The Data LifeCycle Framework (DLC) has been initiated by five primary organisations; Australian Access Federation (AAF), Australia’s Academic and Research Network (AARNet), Australian National Data Service (ANDS), National eResearch Collaboration Tools and Resources (NeCTAR) and Research Data Services (RDS).
The DLCF is a nationwide effort to connect research resources and activities such that researchers can make the best use of existing national, state-based, local, and commercial eResearch tools. It aims to provide a simple path to reliable provenance, more effective collaboration across organisations and assist researchers to position themselves to address the growing potential of increasingly open data.
The DLCF will connect critical elements and points in time of the data journey from grant approval through to project finalisation, results publication and archiving. It will leverage existing eResearch investment to provide a flexible and dynamic national framework supporting research.
The Resource and Activity Persistent identifier (RAiD) is the first of the enabling technologies required for the DLCF.RAiD API is a 'proof of concept' Serverless implementation designed to be hosted on Amazon Web Services (AWS) that will help create and manage RAiDs.
AWS serverless applications are able to conform to a multi-tier architecture, consisting of three defined tiers:
- Data - Store all research activity information and generated JWT tokens for research organisations and providers in AWS DynamoDB (NOSQL). AFF authenticated users JWT tokens are not stored as they are provided by RAPID AAF.
- Logic - RESTful api call are mapped to end points mapped in Amazon API Gateway. API Gateway processes HTTP requests by using micro-services (AWS Lambda using Python runtime) behind a custom security policy (JWT token validation). HTTP status codes and responses are generated depending on the result of the AWS Lambda function.
- Presentation - Static assets (HTML, JavaScript, CSS, etc.) are stored in AWS S3 buckets with public HTTP GET access. AWS provides a HTTP endpoint for content hosting, but disallows server side generated content. This is overcome by storing authenticated sessions as cookies and producing dynamic content with RESTful calls to API Gateway with CORS enabled.
RAiD API is made and deployed using AWS Serverless Application Model (AWS SAM) extension of CloudFormation.
"The AWS Serverless Application Model (AWS SAM, previously known as Project Flourish) extends AWS CloudFormation to provide a simplified way of defining the Amazon API Gateway APIs, AWS Lambda functions, and Amazon DynamoDB tables needed by your serverless application". (AWS 2016)
Development and deployment of the framework will require the following:
RAiD uses the ANDS Handle Service to generate unique and citable 'handles'. This allows organisations and researchers to have a 'clickable' link in their datasets, collections and papers. Handles act as the primary key for a RAiD and are associated to a URL content path which can be changed, but the handle will remain the same. The following steps a required for RAiD API to interact with the minting service:
- Create a VPC and subnet in AWS that will have all outbound traffic come from a single static IPv4 Address.
- Register with ANDS, providing the static IP Address from the previous step for the demo and live handle service.
- Use the 'appID' for the 'AndsAppId' parameter in the deployment steps mentioned later in this document.
- AWS VPC, Subnets and Security to interact with ANDS.
- AWS S3 Bucket for SAM code packages.
- Amazon Elasticsearch Service endpoint for logging and monitoring.
- Python: AWS Lambda supported Python language runtime 2.7.
- PIP : Install and manage Python Modules
- AWS Command Line Interface: Unified tool to manage your AWS services.
- Boto3 : Amazon Web Services SDK for Python.
# Install PIP
python get-pip.py
# Install AWS CLI
pip install awscli
# Configure AWS CLI
aws configure
AWS Access Key ID [None]: <Access Key>
AWS Secret Access Key [None]: <Secret>
Default region name [None]: <Region>
Default output format [None]: ENTER
# Install Boto3
pip install boto3==1.4.4
# Install packages listed in requirements to a directory for package deployment
pip install -r src/requirements.txt -t src/
# Change path into SAM
cd sam
# Package SAM code and replace MY_S3_Bucket with your own
aws cloudformation package --template-file template.yaml --output-template-file template-out.yaml --s3-bucket MY_S3_Bucket
# Replace Swagger AWS account id and region placeholders with your own
sed -i "s/<<account>>/AWS_ACCOUNT_ID/g" 'swagger.yaml'
sed -i "s/<<region>>/AWS_REGION/g" 'swagger.yaml'
# Deploy SAM as an S3 CloudFormation Stack
## Replacing YOUR_SECRET ANDS_APP_ID SUBNET_ID SECURITY_GROUP ES_URL
aws cloudformation deploy --template-file template-out.yaml \
--stack-name RAiD --parameter-overrides \
JwtSecret=YOUR_SECRET \
AndsAppId=ANDS_APP_ID \
AndsSecret=ANDS_SECRET \
AndsSubnets=SUBNET_ID \
AndsSecurityGroups=SECURITY_GROUP \
ElasticsearchHost=ES_URL \
--capabilities CAPABILITY_IAM
MIT-Style akin to ORCiD. See LICENCE.txt for details.