This repo contains the following examples of using Cloud Composer, Google Cloud Platform's managed Apache Airflow service:
-
a. Simple Load DAG: provides a common pattern to automatically trigger, via Google Cloud Function, a Dataflow job when a file arrives in Google Cloud Storage, process the data and load it into BigQuery.
-
a. Ephemeral Dataproc Spark DAG: provides an example of triggering a DAG via HTTP POST to the Airflow API to create a Dataproc cluster, submit a Spark job, and import the newly enhanced GCS files into BigQuery.
-
Composer Dependency Management
a. Composer Dependency Management: provides a common pattern to automatically trigger and implement the composer dependency management. The primary challenge addressed is the need to handle complex dependencies between DAGs with different frequencies. The solution leverages Airflow's dependency management capabilities to create a hierarchical relationship between the parent and child DAGs.
Run this script to automate spin up / tear down of a lightweight airflow environment to run your tests.
./run_tests.sh