canary-lambda

This function has two modes: direct query mode and SQS-triggered mode. For simplicity, this README has been divided into two independent sections.

SQS-triggered mode is the newer mode that listens to an SQS queue for notifications about new files being uploaded. When a new SQS message appears in the queue, this function will be triggered and begin the validation routine. Following the validation, results are published to a separate SQS queue.

Directy query mode is the older, soon-to-be-deprecated mode that queries S3 directly for files uploaded today. All files are validated at once and results are directly pushed to Slack. This mode is severely limited in throughput and for that reason is not favored.

Mode 1: SQS-Triggered Mode

This canary function is an early warning system that reports corrupt data. It listens to an SQS queue for notifications about new files in Amazon Web Services (AWS) Simple Storage Service (S3), and then validate that it meets certain field constraints.

It utilizes the ODE schema validation library to detect records with missing fields, blank fields, fields that do not match an expected range or value, as well as higher-level validations such as ensuring serial fields are sequential and incremented without gaps.

After the canary completes its validation, it publishes the results to an output SQS queue.

Requirements

Python 3.7
PIP3
AWS Lambda Access
S3 folder for storing large SQS messages
SQS queue for filepath notifications (source queue)
SQS queue for results publishing (FIFO) queue for message publishing (sink queue)
S3 IAM role with lenient permissions
- s3:*
SQS IAM role with lenient permissions
- sqs:*

Deployment

This function is deployed manually by uploading a ZIP file to Lambda.

Part 1: Local packaging

Clone the code

git clone https://github.com/usdot-its-jpo-data-portal/canary-lambda.git

Install dependencies and package the code using the package.sh script:

./package.sh

A zip file named canary.zip will be created.

Part 2: Deployment to Lambda

Create a Lambda function

Select Python 3.7 as the runtime and main.lambda_handler as the handler.

Upload the canary.zip file
Set the Execution role to one that has the S3 and SES permissions listed in the Requirements section above.
Recommended resource settings: Memory (MB) 512 MB and Timeout 5 min 0 sec.
Set the environment variables.

Configuration

These configuration properties are sourced from environment variables. To set them locally, run export PROPERTY_NAME=Value, or change them in the Environment variables section in the AWS Lambda console.

Property	Type	Default Value	Description
VERBOSE_OUTPUT	Boolean	FALSE	Increases logging verbosity. Useful for debugging.
SQS_PUBLISHER_MODE	Boolean	TRUE (must be TRUE for this mode)	Activates SQS publisher mode
SQS_RESULT_QUEUE	Array of strings	n/a	Name of SQS queue to which validation results are sent (not the URL, not the ARN, just the name)
SQS_STORAGE_S3_BUCKET	String	n/a	Name of the S3 storage bucket used in sending large SQS messages

Usage

Run the function by setting up an SQS event trigger from your source queue.

Mode 2: Direct Query Mode

Summary

This canary function is an early warning system that reports corrupt data. It is meant to run once a day on a schedule, sample Amazon Web Services (AWS) Simple Storage Service (S3) data uploaded that day, and then validate that it meets certain field constraints.

It utilizes the ODE schema validation library to detect records with missing fields, blank fields, fields that do not match an expected range or value, as well as higher-level validations such as ensuring serial fields are sequential and incremented without gaps.

After the canary completes its execution, it sends a report to Slack.

Requirements

Python 3.7
PIP
AWS Lambda Access
S3 Permissions within AWS
- s3:Get*
- s3:List*
(Optional) Slack and Slack webhook

Deployment

This function is deployed manually by uploading a ZIP file to Lambda.

Part 1: Local packaging

Clone the code

git clone https://github.com/usdot-its-jpo-data-portal/canary-lambda.git

Install dependencies and package the code using the package.sh script:

./package.sh

A zip file named canary.zip will be created.

Part 2: Deployment to Lambda

Create a Lambda function

Select Python 3.7 as the runtime and main.lambda_handler as the handler.

Upload the canary.zip file
Set the Execution role to one that has the S3 and SES permissions listed in the Requirements section above.
Recommended resource settings: Memory (MB) 896 MB and Timeout 15 min 0 sec.
Set the environment variables.

Configuration

These configuration properties are sourced from environment variables. To set them locally, run export PROPERTY_NAME=Value, or change them in the Environment variables section in the AWS Lambda console.

Property	Type	Default Value	Description
VERBOSE_OUTPUT	Boolean	FALSE	Increases logging verbosity. Useful for debugging.
USE_STATIC_PREFIXES	Boolean	FALSE	Overrides the default behavior which is to query for files uploaded today.
STATIC_PREFIXES	Array of strings	n/a	Used with USE_STATIC_PREFIXES to override which files are analyzed.
S3_BUCKET	String	n/a	Name of the S3 bucket containing data to be validated.
DATA_PROVIDERS	Array of strings	wydot	Name(s) of the data providers, used to change which file uploader's data is to be analyzed.
MESSAGE_TYPES	Array of strings	BSM,TIM	Message type(s) to be analyzed.
SEND_SLACK_MESSAGE	Boolean	TRUE	Upon completion, function will send execution report to slack.
SLACK_WEBHOOK	String	n/a	WARNING - SECRET! Slack app integration webhook url to which reports are sent.
DAY_OFFSET	Integer	-1	How many days after today should the timestamp be offset. Default to yesterday. Useful when working with CRON triggers.

Usage

Run the function on a schedule by setting up a CRON-triggered CloudWatch event. (Note that the Cloudformation template includes a once-a-day CloudWatch trigger event at 12:01AM UTC).

Testing

Run a local test by running the function as a standard python3 script: python main.py.

Release Notes

Version 0.0.5

Send result to FIFO SQS queue
Expand canary validation to work with different data schemas depending on pilot name and message type

Version 0.0.4

Added SQS publisher mode

Version 0.0.3

Added error list to output
Added serialID to output
Removed file list from output

Version 0.0.2

Added DAY_OFFSET configuration property to allow the canary to automatically analyze data from days different than the current one
Updated canary with support for latest odevalidator v0.0.4 library changes (including support for analyzing TIMs)
Bugfixes and cleanup

Version 0.0.1

Initial functional version

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
images		images
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
buildspec.yml		buildspec.yml
docker-run.sh		docker-run.sh
package.sh		package.sh
sonar-project.properties		sonar-project.properties
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

canary-lambda

Mode 1: SQS-Triggered Mode

Requirements

Deployment

Part 1: Local packaging

Part 2: Deployment to Lambda

Configuration

Usage

Mode 2: Direct Query Mode

Summary

Requirements

Deployment

Part 1: Local packaging

Part 2: Deployment to Lambda

Configuration

Usage

Testing

Release Notes

Version 0.0.5

Version 0.0.4

Version 0.0.3

Version 0.0.2

Version 0.0.1

About

Releases 4

Packages

Contributors 2

Languages

License

usdot-its-jpo-data-portal/canary-lambda

Folders and files

Latest commit

History

Repository files navigation

canary-lambda

Mode 1: SQS-Triggered Mode

Requirements

Deployment

Part 1: Local packaging

Part 2: Deployment to Lambda

Configuration

Usage

Mode 2: Direct Query Mode

Summary

Requirements

Deployment

Part 1: Local packaging

Part 2: Deployment to Lambda

Configuration

Usage

Testing

Release Notes

Version 0.0.5

Version 0.0.4

Version 0.0.3

Version 0.0.2

Version 0.0.1

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages