Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: application configuration #386

Merged
merged 18 commits into from
Jan 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions config/config.yaml

This file was deleted.

File renamed without changes.
5 changes: 5 additions & 0 deletions config/ot_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
defaults:
- config
- datasets: ot_gcp
- _self_
- override step/session: dataproc
3 changes: 0 additions & 3 deletions config/step/finngen_sumstat_preprocess.yaml

This file was deleted.

3 changes: 0 additions & 3 deletions config/step/gwas_catalog_sumstat_preprocess.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions config/step/ld_index.yaml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_target_: otg.colocalisation.ColocalisationStep
credible_set_path: ${datasets.credible_set}
credible_set_path: ${datasets.study_locus}
study_index_path: ${datasets.study_index}
coloc_path: ${datasets.colocalisation}
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.eqtl_catalogue.EqtlCatalogueStep
defaults:
- eqtl_catalogue

eqtl_catalogue_paths_imported: ${datasets.eqtl_catalogue_paths_imported}
eqtl_catalogue_study_index_out: ${datasets.eqtl_catalogue_study_index_out}
eqtl_catalogue_summary_stats_out: ${datasets.eqtl_catalogue_summary_stats_out}
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
_target_: otg.finngen_studies.FinnGenStudiesStep
defaults:
- finngen_studies

finngen_study_index_out: ${datasets.finngen_study_index}
5 changes: 5 additions & 0 deletions config/step/ot_finngen_sumstat_preprocess.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
defaults:
- finngen_sumstat_preprocess

raw_sumstats_path: ???
out_sumstats_path: ???
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
_target_: otg.gene_index.GeneIndexStep
defaults:
- gene_index

target_path: ${datasets.target_index}
gene_index_path: ${datasets.gene_index}
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.gwas_catalog_ingestion.GWASCatalogIngestionStep
defaults:
- gwas_catalog_ingestion

catalog_study_files: ${datasets.catalog_studies}
catalog_ancestry_files: ${datasets.catalog_ancestries}
catalog_associations_file: ${datasets.catalog_associations}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.gwas_catalog_study_curation.GWASCatalogStudyCurationStep
defaults:
- gwas_catalog_study_curation

catalog_study_files: ${datasets.catalog_studies}
catalog_ancestry_files: ${datasets.catalog_ancestries}
catalog_sumstats_lut: ${datasets.catalog_sumstats_lut}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.gwas_catalog_study_inclusion.GWASCatalogInclusionGenerator
defaults:
- gwas_catalog_study_inclusion

catalog_study_files: ${datasets.catalog_studies}
catalog_ancestry_files: ${datasets.catalog_ancestries}
catalog_associations_file: ${datasets.catalog_associations}
Expand Down
5 changes: 5 additions & 0 deletions config/step/ot_gwas_catalog_sumstat_preprocess.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
defaults:
- gwas_catalog_sumstat_preprocess

raw_sumstats_path: ???
out_sumstats_path: ???
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.ld_based_clumping.LdBasedClumpingStep
defaults:
- ld_based_clumping

study_locus_input_path: ???
ld_index_path: ???
study_index_path: ???
Expand Down
4 changes: 4 additions & 0 deletions config/step/ot_ld_index.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
defaults:
- ld_index

ld_index_out: ${datasets.ld_index}
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
_target_: otg.l2g.LocusToGeneStep
defaults:
- locus_to_gene

session:
extended_spark_conf:
spark.dynamicAllocation.enabled: false
run_mode: train
wandb_run_name: null
perform_cross_validation: false
Expand Down
4 changes: 3 additions & 1 deletion config/step/pics.yaml → config/step/ot_pics.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
_target_: otg.pics.PICSStep
defaults:
- pics

study_locus_ld_annotated_in: ???
picsed_study_locus_out: ???
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.overlaps.OverlapsIndexStep
defaults:
- overlaps

study_locus_path: ${datasets.outputs}/credible_set
study_index_path: ${datasets.outputs}/study_index
overlaps_index_out: ${datasets.outputs}/study_locus_overlap
3 changes: 2 additions & 1 deletion config/step/ukbiobank.yaml → config/step/ot_ukbiobank.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
_target_: otg.ukbiobank.UKBiobankStep
defaults:
- ukbiobank
ukbiobank_manifest: ${datasets.ukbiobank_manifest}
ukbiobank_study_index_out: ${datasets.ukbiobank_study_index}
6 changes: 4 additions & 2 deletions config/step/v2g.yaml → config/step/ot_v2g.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
_target_: otg.v2g.V2GStep
defaults:
- variant_to_gene

variant_index_path: ${datasets.variant_index}
variant_annotation_path: ${datasets.variant_annotation}
gene_index_path: ${datasets.gene_index}
vep_consequences_path: ${datasets.vep_consequences}
liftover_chain_file_path: ${datasets.chain_37_38}
intervals:
interval_sources:
andersson: ${datasets.anderson}
javierre: ${datasets.javierre}
jung: ${datasets.jung}
Expand Down
4 changes: 4 additions & 0 deletions config/step/ot_variant_annotation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
defaults:
- variant_annotation

variant_annotation_path: ${datasets.variant_annotation}
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: otg.variant_index.VariantIndexStep
defaults:
- variant_index

variant_annotation_path: ${datasets.variant_annotation}
credible_set_path: ${datasets.study_locus}
variant_index_path: ${datasets.variant_index}
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
_target_: otg.window_based_clumping.WindowBasedClumpingStep
defaults:
- window_based_clumping

summary_statistics_input_path: ???
study_locus_output_path: ???
inclusion_list_path: ???
locus_collect_distance: null
5 changes: 3 additions & 2 deletions config/step/session/dataproc.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
_target_: otg.common.session.Session
defaults:
- base_session

spark_uri: yarn
hail_home: /opt/conda/miniconda3/lib/python3.10/site-packages/hail
write_mode: errorifexists
4 changes: 0 additions & 4 deletions config/step/session/local.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions config/step/variant_annotation.yaml

This file was deleted.

8 changes: 4 additions & 4 deletions docs/development/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ This section describes how to set up a local Airflow server which will orchestra
- [Docker](https://docs.docker.com/get-docker/)
- [Google Cloud SDK](https://cloud.google.com/sdk/docs/install)

!!!warning macOS Docker memory allocation
!!! warning macOS Docker memory allocation
On macOS, the default amount of memory available for Docker might not be enough to get Airflow up and running. Allocate at least 4GB of memory for the Docker Engine (ideally 8GB). [More info](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#)

## Configure Airflow access to Google Cloud Platform

!!!warning Specifying Google Cloud parameters
!!! warning Specifying Google Cloud parameters
Run the next two command with the appropriate Google Cloud project ID and service account name to ensure the correct Google default application credentials are set up.

Authenticate to Google Cloud:
Expand All @@ -37,7 +37,7 @@ cd src/airflow

### Build Docker image

!!!note Custom Docker image for Airflow
!!! note Custom Docker image for Airflow
The custom Dockerfile built by the command below extends the official [Airflow Docker Compose YAML](https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml). We add support for Google Cloud SDK, Google Dataproc operators, and access to GCP credentials.

```bash
Expand All @@ -46,7 +46,7 @@ docker build . --tag extending_airflow:latest

### Set Airflow user ID

!!!note Setting Airflow user ID
!!! note Setting Airflow user ID
These commands allow Airflow running inside Docker to access the credentials file which was generated earlier.

```bash
Expand Down
1 change: 1 addition & 0 deletions docs/howto/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
title: How-to
5 changes: 5 additions & 0 deletions docs/howto/_howto.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# How-to

This page contains a collection of how-to guides for the project.

For additional information please visit [https://community.opentargets.org/](https://community.opentargets.org/)
44 changes: 44 additions & 0 deletions docs/howto/run_step_in_cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
Title: Run step in CLI
---

# Run step in CLI

To run a step in the command line interface (CLI), you need to know the step's name. To list what steps are avaiable in your current environment, simply run `otg` with no arguments. This will list all the steps:

```
You must specify 'step', e.g, step=<OPTION>
Available options:
clump
colocalisation
eqtl_catalogue
finngen_studies
finngen_sumstat_preprocess
gene_index
gwas_catalog_ingestion
gwas_catalog_sumstat_preprocess
ld_index
locus_to_gene
overlaps
pics
ukbiobank
variant_annotation
variant_index
variant_to_gene

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
```

As indicated, you can run a step by specifying the step's name with the `step` argument. For example, to run the `gene_index` step, you can run:

```bash
otg step=gene_index
```

In most occassions, some mandatory values will be required to run the step. For example, the `gene_index` step requires the `step.target_path` and `step.gene_index_path` argument to be specified. You can complete the necessary arguments by adding them to the command line:

```bash
otg step=gene_index step.target_path=/path/to/target step.gene_index_path=/path/to/gene_index
```

You can find more about the available steps in the [documentation](../python_api/step/_step.md).
49 changes: 49 additions & 0 deletions docs/howto/run_step_using_config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
Title: Run step using config
---

# Run step using YAML config

It's possible to parametrise the functionality of a step using a YAML configuration file. This is useful when you want to run a step multiple times with different parameters or simply to avoid having to specify the same parameters every time you run a step.

!!! info Configuration files using Hydra
The package uses [Hydra](https://hydra.cc) to handle configuration files. For more information, please visit the [Hydra documentation](https://hydra.cc/docs/intro/).

To run a step using a configuration file, you need to create a configuration file in YAML format.

```{ .sh .no-copy }
config/
├─ step/
│ └─ my_gene_index.md
└─ my_config.yml
```

The configuration file should contain the parameters you want to use to run the step. For example, to run the `gene_index` step, you need to specify the `step.target_path` and `step.gene_index_path` parameters. The configuration file should look like this:

=== "my_config.yaml"

``` yaml
defaults:
- config
- _self_
```

This config file will specify that your configuration file will inherit the default configuration (`config`) and everything provided (`_self_`) will overwrite the default configuration.

=== "step/my_gene_index.md"

``` yaml
defaults:
- gene_index

target_path: /path/to/target
gene_index_path: /path/to/gene_index
```

This config file will inherit the default configuration for the `gene_index` step and overwrite the `target_path` and `gene_index_path` parameters.
d0choa marked this conversation as resolved.
Show resolved Hide resolved

Once you have created the configuration file, you can run your own new `my_gene_index`:

```bash
otg step=my_gene_index --config-dir=config --config-name=my_config
```
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ Ingestion and analysis of genetic and functional genomic data for the identifica

This project is still in experimental phase. Please refer to the [roadmap section](roadmap.md) for more information.

For all development information, including running the code, troubleshooting, or contributing, see the [development section](./development/).
For all development information, including running the code, troubleshooting, or contributing, see the [development section](development/_development.md).
2 changes: 1 addition & 1 deletion docs/python_api/step/gwas_catalog_inclusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: Generate inclusion and exclusions lists for GWAS Catalog study ingestion.
---

::: otg.gwas_catalog_study_inclusion.GWASCatalogInclusionGenerator
::: otg.gwas_catalog_study_inclusion.GWASCatalogStudyInclusionGenerator
10 changes: 0 additions & 10 deletions docs/usage.md

This file was deleted.

14 changes: 10 additions & 4 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
site_name: Open Targets Genetics

nav:
- installation.md
- usage.md
- Home: index.md
- Installation: installation.md
- ... | howto/**
- Roadmap: roadmap.md
- ... | development/**
- ... | python_api/**

plugins:
- search
- awesome-pages
- awesome-pages:
collapse_single_pages: true
- mkdocstrings:
handlers:
python:
options:
filters: ["!^_", "!__new__"]
filters: ["!^_", "!__new__", "__init__"]
show_signature_annotations: true
show_root_heading: true
- section-index
Expand Down Expand Up @@ -41,6 +44,9 @@ markdown_extensions:
- pymdownx.superfences
- toc:
permalink: true
- pymdownx.tabbed:
alternate_style: true
combine_header_slug: true

hooks:
- src/utils/schemadocs.py
Expand Down
2 changes: 1 addition & 1 deletion src/airflow/dags/common_airflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

# CLI configuration.
CLUSTER_CONFIG_DIR = "/config"
CONFIG_NAME = "config"
CONFIG_NAME = "ot_config"
PYTHON_CLI = "cli.py"

# Shared DAG construction parameters.
Expand Down
Loading