Skip to content

How to clean‐up and reload test data from a CCW pipeline load

aschey-forpeople edited this page Aug 7, 2024 · 2 revisions

WARNING: this procedure causes data loss and should only be ran on an ephemeral environment

It may be useful to test pipeline updates in an ephemeral environment to ensure everything is functioning properly.

To test out a full pipeline load, you'll need to find a data set that includes beneficiary updates. Most updates in the test environment don't include this.

  • Check the database to see when a beneficiary load was last ran:
select max(last_updated) from ccw.beneficiaries
  • Check the bfd-test-etl-577373831711 bucket for a dataset that matches the date (the date on the object prefix may be slightly different, but it should be pretty close). Note that the data must not be more than 60 days old. If it is, you'll need to change the object prefix to use a more recent date and update the timestamp field on the manifest to match.

  • Remove the data in the database that was created from the load. This can be done by running this SQL script. This will remove any beneficiaries and associated claims that were added or updated during the load. You will need to supply the S3 bucket prefix that you located earlier.

  • Copy the files from their current location into the Incoming folder that's associated with your ephemeral environment. For example, if your environment is test-1000, and the data load you want to use has the prefix 2024-01-01T00:00:00Z then you would copy the files from bfd-test-etl-577373831711/Synthetic/Done/2024-01-01T00:00:00Z to bfd-1000-test-etl{timestamp}/Synthetic/Incoming/2024-01-01T00:00:00Z.

  • The pipeline should pick up and ingest the files. You can repeat this as many times as you'd like by re-running the SQL script to clear out the data and then either re-copying the files or restarting the pipeline.

Clone this wiki locally