Skip to content

DAGs main

mrmiguez edited this page Mar 28, 2023 · 3 revisions

SSDN DAGS

ssdn_dags contains the information Airflow needs to instantiate and control the data harvest and transform tasks.

Testing DAGs

Two DAGs are for testing new partners and new maps:

  • ssdn_single_harvest
  • ssdn_single_transform

Both need to be started with the "Trigger DAG with config" launch option.

image

Production DAG

ssdn_dynamic_harvest is the production harvest DAG. It dynamically interates through manatus configs to harvest, map, and enhance partner data.

All configuration is managed through the ssdn_manatus_configs repository. See the page on updating configs for instructions on managing and charging partner harvests.

The final task of the production run DAG is the count_records task. It reads the completed JSONL file (in red) and prints partner record counts into the log (in blue):

image

Data submission DAG

submit_to_dpla is the final data harvest step. It deposits the completed data file in SSDN's s3 bucket for DPLA. The DAG must be triggered with a config that includes the file for submission. It can be copy and pasted from the final count_records task in the ssdn_dynamic_harvest DAG:

image