-
Notifications
You must be signed in to change notification settings - Fork 0
DAGs main
ssdn_dags contains the information Airflow needs to instantiate and control the data harvest and transform tasks.
Two DAGs are for testing new partners and new maps:
ssdn_single_harvest
ssdn_single_transform
Both need to be started with the "Trigger DAG with config" launch option.
ssdn_dynamic_harvest
is the production harvest DAG. It dynamically interates through
manatus configs to harvest, map, and enhance partner data.
All configuration is managed through the
ssdn_manatus_configs
repository. See the page on updating configs for instructions on
managing and charging partner harvests.
The final task of the production run DAG is the count_records
task. It reads the completed
JSONL file (in red) and prints partner record counts into the log (in blue):
submit_to_dpla
is the final data harvest step. It deposits the completed data file in
SSDN's s3 bucket for DPLA. The DAG must be triggered with a config that includes the file
for submission. It can be copy and pasted from the final count_records
task in the
ssdn_dynamic_harvest
DAG: