Skip to content

Port the Facebook validation pipeline to be generic and automatable #59

@capnrefsmmat

Description

@capnrefsmmat

Currently, covid-19/facebook/prepare-extracts/covidalert-io-funs.R contains a validation pipeline for Facebook. As I understand it, it does the following checks when we prepare a new day of data to upload.

  1. Ensure that the old data in the API matches the newly generated old data; that is, if it's currently June 1, make sure we didn't unexpectedly change data for May 25th.
  2. Do sanity checks on the new data, such as the geography types being reasonable, the geo_ids having the right format, the values and SEs being in the correct range, sample sizes are present, dates aren't missing, etc.
  3. Verify the number of geographical regions reporting hasn't suddenly changed.
  4. Verify the average variable values haven't suddenly changed.

Many of these checks can be made generic to multiple data sources and applied to our new pipeline. This would require

  • adapt the script to work on Taylor's directory structure where data files are placed
  • provide for configuration files that specify the checks that apply to each data source
  • adapt the script to report all errors, rather than dying on the first one
  • make it easy to automatically run for each all data source as a part of its automation job

Metadata

Metadata

Assignees

Labels

CTISImprovements and reporting for CTIS

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions