-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(credible set qc dag): added dag and docs (#59)
- Loading branch information
1 parent
0ef3545
commit 1a4a537
Showing
6 changed files
with
166 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
## Credible set qc dag | ||
|
||
Credible set qc is a set of operations performed on the `StudyLocus` datasets originally finemapped by OpenTargets to: | ||
|
||
- Ensure pValue of each locus does meet the pre-defined threshold | ||
- Perform repartitioning of the credible sets, as the output from the batch job contains files per loci, resulting in slow queries. | ||
- Ensure no duplicated loci exist in the clean credible sets. | ||
|
||
![credible_set_qc](credible_set_qc.svg) | ||
|
||
The dag contains following steps: | ||
|
||
- qc of credible sets coming from `gwas_catalog_sumstats_susie` bucket | ||
- qc of credible sets coming from `ukb_ppp_eur_data` bucket | ||
|
||
> [!NOTE] | ||
> The outputs of the steps are contained in the target bucket with prefix _credible_set_clean_. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
dataproc: | ||
python_main_module: gs://genetics_etl_python_playground/initialisation/gentropy/dev/cli.py | ||
cluster_metadata: | ||
PACKAGE: gs://genetics_etl_python_playground/initialisation/gentropy/dev/gentropy-0.0.0-py3-none-any.whl | ||
cluster_init_script: gs://genetics_etl_python_playground/initialisation/gentropy/dev/install_dependencies_on_cluster.sh | ||
cluster_name: otg-credible-set-qc | ||
autoscaling_policy: otg-etl | ||
|
||
nodes: | ||
- id: gwas_catalog_sumstats_susie_credible_set_qc | ||
kind: Task | ||
prerequisites: [] | ||
params: | ||
step: credible_set_qc | ||
step.credible_sets_path: gs://gwas_catalog_sumstats_susie/credible_set_datasets | ||
step.output_path: gs://gwas_catalog_sumstats_susie/credible_set_clean | ||
step.p_value_threshold: 1.0e-5 | ||
step.purity_min_r2: 0.01 | ||
step.n_partitions: 200 | ||
step.session.write_mode: overwrite | ||
step.session.start_hail: true | ||
|
||
- id: ukb_ppp_eur_data_credible_set_qc | ||
kind: Task | ||
prerequisites: [] | ||
params: | ||
step: credible_set_qc | ||
step.credible_sets_path: gs://ukb_ppp_eur_data/credible_set_datasets/susie | ||
step.output_path: gs://ukb_ppp_eur_data/credible_set_clean | ||
step.p_value_threshold: 1.0e-5 | ||
step.purity_min_r2: 0.01 | ||
step.n_partitions: 50 | ||
step.session.write_mode: overwrite | ||
step.session.start_hail: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
"""Airflow DAG for the credible set qc.""" | ||
|
||
from __future__ import annotations | ||
|
||
from pathlib import Path | ||
|
||
from airflow.models.dag import DAG | ||
|
||
from ot_orchestration.utils import chain_dependencies, read_yaml_config | ||
from ot_orchestration.utils.common import shared_dag_args, shared_dag_kwargs | ||
from ot_orchestration.utils.dataproc import ( | ||
generate_dataproc_task_chain, | ||
submit_gentropy_step, | ||
) | ||
|
||
CONFIG_FILE_PATH = Path(__file__).parent / "config" / "credible_set_qc.yaml" | ||
config = read_yaml_config(CONFIG_FILE_PATH) | ||
|
||
with DAG( | ||
dag_id=Path(__file__).stem, | ||
description="Open Targets Genetics — CredibleSet QC ", | ||
default_args=shared_dag_args, | ||
**shared_dag_kwargs, | ||
): | ||
tasks = {} | ||
for step in config["nodes"]: | ||
task = submit_gentropy_step( | ||
cluster_name=config["dataproc"]["cluster_name"], | ||
step_name=step["id"], | ||
python_main_module=config["dataproc"]["python_main_module"], | ||
params=step["params"], | ||
) | ||
tasks[step["id"]] = task | ||
|
||
chain_dependencies(nodes=config["nodes"], tasks_or_task_groups=tasks) | ||
dag = generate_dataproc_task_chain( | ||
cluster_name=config["dataproc"]["cluster_name"], | ||
cluster_init_script=config["dataproc"]["cluster_init_script"], | ||
cluster_metadata=config["dataproc"]["cluster_metadata"], | ||
tasks=[t for t in tasks.values()], | ||
) |