Skip to content

Perform ETL into the OpenAQ Low Cost Sensor Database

License

Notifications You must be signed in to change notification settings

caparker/openaq-lcs-fetch

 
 

Repository files navigation

OpenAQ-ETL

Perform ETL into the OpenAQ Low Cost Sensor Database

Deploy

  • yarn cdk deploy deploy this stack to your default AWS account/region
  • yarn cdk diff compare deployed stack with current state
  • yarn cdk synth emits the synthesized CloudFormation template

Development

Javascript Documentation can be obtained by running the following

yarn doc

Tests can be run using the following

yarn test

Env Variables

Configuration for the ingestion is provided via environment variables.

  • BUCKET: The bucket to which the ingested data should be written. Required
  • SOURCE: The data source to ingest. Required
  • LCS_API: The API used when fetching supported measurands. Default: 'https://api.openaq.org'
  • STACK: The stack to which the ingested data should be associated. This is mainly used to apply a prefix to data uploaded to S3 in order to separate it from production data. Default: 'local'
  • SECRET_STACK: The stack to which the used Secrets are associated. At times, a developer may want to use credentials relating to a different stack (e.g. a devloper is testing the script, they want output data uploaded to the local stack but want to use the production stack's secrets). Default: the value from the STACK env variable
  • VERBOSE: Enable verbose logging. Default: disabled

Running locally

To run the ingestion script locally (useful for testing without deploying), see the following example:

LCS_API=https://api.openaq.org \
STACK=my-dev-stack \
SECRET_STACK=my-prod-stack \
BUCKET=openaq-fetches \
VERBOSE=1 \
SOURCE=habitatmap \
node fetcher/index.js

Data Sources

Data Sources can be configured by adding a config file & corresponding provider script. The two sections below outline what is necessary to create and a new source.

Source Config

The first step for a new source is to add JSON config file to the the fetcher/sources directory.

{
  "schema": "v1",
  "provider": "example",
  "frequency": "hour",
  "meta": {}
}
Attribute Note
provider Unique provider name
frequency day, hour, or minute

The config file can contain any properties that should be configurable via the provider script. The above table however outlines the attributes that are required.

Provider Script

The second step is to add a new provider script to the fetcher/providers directory.

The script here should expose a function named processor. This function should pass SensorSystem & Measures objects to the Providers class.

The script below is a basic example of a new source:

const Providers = require("../providers");
const { Sensor, SensorNode, SensorSystem } = require("../station");
const { Measures, FixedMeasure, MobileMeasure } = require("../measure");

async function processor(source_name, source) {
  // Get Locations/Sensor Systems via http/s3 etc.
  const locs = await get_locations();

  // Map locations into SensorNodes
  const station = new SensorNode();

  await Providers.put_stations(source_name, [station]);

  const fixed_measures = new Measures(FixedMeasure);
  // or
  const mobile_measures = new Measures(MobileMeasure);

  fixed_measures.push(
    new FixedMeasure({
      sensor_id: "PurpleAir-123",
      measure: 123,
      timestamp: Math.floor(new Date() / 1000), //UNIX Timestamp
    })
  );

  await Providers.put_measures(source_name, fixed_measures);
}

module.exports = { processor };

Provider Secrets

For data providers that require credentials, credentials should be store on AWS Secrets Manager with an ID composed of the stack name and provider name, such as :stackName/:providerName.

Google Keys

Some providers (e.g. CMU, Clarity) require us to read data from Google services (e.g. Drive, Sheets). To do this, the organization hosting the data should do the following:

  1. create a project & enable access to the required APIs
  2. create a service account
  3. generate service account keys

The should look something like the following and be stored in its entirety within the AWS Secrets Manager.

{
  "type": "service_account",
  "project_id": "project-id",
  "private_key_id": "key-id",
  "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account-email",
  "client_id": "client-id",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}

About

Perform ETL into the OpenAQ Low Cost Sensor Database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 93.4%
  • TypeScript 6.6%