Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Finngen FM results ingestion #394

Closed
wants to merge 24 commits into from

Conversation

xyg123
Copy link
Contributor

@xyg123 xyg123 commented Jan 8, 2024

Added "region", "credibleSetIndex" and "credibleSetlog10BF" columns to the studyLocus schema, this accomodates SuSie finemapping outputs.

@codecov-commenter
Copy link

codecov-commenter commented Jan 8, 2024

Codecov Report

Attention: 95 lines in your changes are missing coverage. Please review.

Comparison is base (42b366c) 85.67% compared to head (8e8c913) 85.88%.
Report is 49 commits behind head on dev.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #394      +/-   ##
==========================================
+ Coverage   85.67%   85.88%   +0.21%     
==========================================
  Files          89       98       +9     
  Lines        2101     2629     +528     
==========================================
+ Hits         1800     2258     +458     
- Misses        301      371      +70     
Files Coverage Δ
src/airflow/dags/common_airflow.py 90.38% <100.00%> (ø)
src/airflow/dags/finngen_preprocess.py 100.00% <100.00%> (ø)
src/airflow/dags/gwas_catalog_harmonisation.py 43.47% <ø> (ø)
src/airflow/dags/gwas_curation_update.py 100.00% <100.00%> (ø)
src/otg/common/session.py 87.50% <100.00%> (+0.32%) ⬆️
src/otg/dataset/dataset.py 91.80% <100.00%> (ø)
src/otg/dataset/l2g_feature_matrix.py 82.92% <ø> (+7.31%) ⬆️
src/otg/dataset/study_locus.py 96.20% <100.00%> (+0.04%) ⬆️
src/otg/datasource/finngen/finngen_finemapping.py 100.00% <100.00%> (ø)
src/otg/datasource/finngen/study_index.py 100.00% <100.00%> (ø)
... and 26 more

@addramir addramir marked this pull request as draft January 8, 2024 18:04
@addramir addramir changed the title feat: fix in StudyLocus for SuSie fm feat: Finngen FM results ingestion Jan 8, 2024
@xyg123 xyg123 marked this pull request as ready for review January 10, 2024 09:52
@d0choa
Copy link
Collaborator

d0choa commented Jan 11, 2024

I merged #401 which touches the finngen preprocess DAG. Ask @DSuveges if you have any questions on how to resolve the conflicts.

Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! I've added some code suggestions and general questions.
It'd be good to see one example of a studyLocus extracted from susie's results.
Thanks!

@@ -15,3 +15,5 @@ title: FinnGen
[FinnGen](https://www.finngen.fi/en) is a research project in genomics and personalized medicine, representing a large public-private partnership. The project has collected and analyzed genome and health data from 500,000 Finnish biobank donors to understand the genetic basis of diseases. FinnGen is now expanding its focus to comprehend the progression and biological mechanisms of diseases. This initiative provides a world-class resource for further breakthroughs in disease prevention, diagnosis, and treatment, offering insights into our genetic makeup.

For a comprehensive understanding of the dataset and methods, refer to [Kurki et al., 2023](https://www.nature.com/articles/s41586-022-05473-8).

We ingested full GWAS sumamry statstics and SuSiE-based fine-mapping results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We ingested full GWAS sumamry statstics and SuSiE-based fine-mapping results.
We ingested full GWAS summary statistics and SuSiE-based fine-mapping results.

trigger_rule=TriggerRule.ALL_DONE,
)
# with TaskGroup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be uncommented. I understand that you're defining 2 processing streams that generate credible sets: ingestion of fine mapping results directly; processing of summary stats

>> ld_clumping
>> pics
>> common.delete_cluster(CLUSTER_NAME)
# >> [finngen_summary_stats_preprocess, finngen_finemapping]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be uncommented as well.

if TYPE_CHECKING:
pass

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session object has a logger. Do you need this for something in specific?

Returns:
StudyLocus: Processed SuSIE finemapping output in StudyLocus format.
"""
processed_finngen_finemapping_df = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually a better practice to avoid reading data in the functions that contain business logic, so it's easier to test and debug. You read data in the step, and then call your function injecting the dependencies. Ideally you shouldn't need spark here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -82,6 +88,18 @@
"nullable": true,
"type": "string"
},
{
"metadata": {},
"name": "credibleSetIndex",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the interpretation of this index? I feel that we've talked about this already, can you remind me if credible set 1 means anything over credible set 2?

@@ -25,6 +25,9 @@ ukbiobank_manifest: gs://genetics-portal-input/ukb_phenotypes/neale2_saige_study
l2g_gold_standard_curation: ${datasets.inputs}/l2g/gold_standard/curation.json
gene_interactions: ${datasets.inputs}/l2g/interaction # 23.09 data
eqtl_catalogue_paths_imported: ${datasets.inputs}/preprocess/eqtl_catalogue/tabix_ftp_paths_imported.tsv
finngen_finemapping_results_url: gs://genetics-portal-dev-analysis/xg1/Finngen_finemapping_r10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put this in a more general bucket.

@@ -25,6 +25,9 @@ ukbiobank_manifest: gs://genetics-portal-input/ukb_phenotypes/neale2_saige_study
l2g_gold_standard_curation: ${datasets.inputs}/l2g/gold_standard/curation.json
gene_interactions: ${datasets.inputs}/l2g/interaction # 23.09 data
eqtl_catalogue_paths_imported: ${datasets.inputs}/preprocess/eqtl_catalogue/tabix_ftp_paths_imported.tsv
finngen_finemapping_results_url: gs://genetics-portal-dev-analysis/xg1/Finngen_finemapping_r10
finngen_finemapping_summaries_url: gs://genetics-portal-dev-analysis/xg1/Finngen_susie_credset_summary_r10.tsv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@@ -25,6 +25,9 @@ ukbiobank_manifest: gs://genetics-portal-input/ukb_phenotypes/neale2_saige_study
l2g_gold_standard_curation: ${datasets.inputs}/l2g/gold_standard/curation.json
gene_interactions: ${datasets.inputs}/l2g/interaction # 23.09 data
eqtl_catalogue_paths_imported: ${datasets.inputs}/preprocess/eqtl_catalogue/tabix_ftp_paths_imported.tsv
finngen_finemapping_results_url: gs://genetics-portal-dev-analysis/xg1/Finngen_finemapping_r10
finngen_finemapping_summaries_url: gs://genetics-portal-dev-analysis/xg1/Finngen_susie_credset_summary_r10.tsv
finngen_release_prefix: "finngen_R10_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we process Finngen's manifest, studyIds are in upper case

@@ -43,6 +46,7 @@ catalog_study_locus: ${datasets.study_locus}/catalog_study_locus
gwas_catalog_study_curation: ${datasets.inputs}/v2d/GWAS_Catalog_study_curation.tsv
finngen_study_index: ${datasets.study_index}/finngen
finngen_summary_stats: ${datasets.summary_statistics}/finngen
finngen_finemapping_out: gs://genetics-portal-dev-analysis/xg1/Finngen_finemapping_r10_processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are outputting credible sets, I think this should be written to ${datasets.credible_set}/finngen

@xyg123 xyg123 marked this pull request as draft January 18, 2024 11:02
@xyg123 xyg123 closed this Jan 18, 2024
@xyg123
Copy link
Contributor Author

xyg123 commented Jan 18, 2024

Close this PR because I couldn't figure out how to merge changes from dev to this branch after the repo name was changed. Comments and other changes will be implemented in the new finngen_fm_ingestion branch

xyg123 added a commit that referenced this pull request Jan 24, 2024
xyg123 added a commit that referenced this pull request Feb 8, 2024
* feat: ingest finngen r10 finemapping w/ airflow

* fix: addressed comments from PR #394

* fix: address comments from PR

* Update config/step/ot_finngen_finemapping_ingestion.yaml

Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>

* Update config/step/ot_finngen_finemapping_ingestion.yaml

Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>

* chore: remove unnecessary config

---------

Co-authored-by: Yakov <yt4@sanger.ac.uk>
Co-authored-by: David Ochoa <ochoa@ebi.ac.uk>
Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>
Co-authored-by: David Ochoa <dogcaesar@gmail.com>
xyg123 added a commit that referenced this pull request Feb 12, 2024
* feat(finemapping): ingest finngen r10 finemapping w/ airflow (#435)

* feat: ingest finngen r10 finemapping w/ airflow

* fix: addressed comments from PR #394

* fix: address comments from PR

* Update config/step/ot_finngen_finemapping_ingestion.yaml

Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>

* Update config/step/ot_finngen_finemapping_ingestion.yaml

Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>

* chore: remove unnecessary config

---------

Co-authored-by: Yakov <yt4@sanger.ac.uk>
Co-authored-by: David Ochoa <ochoa@ebi.ac.uk>
Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>
Co-authored-by: David Ochoa <dogcaesar@gmail.com>

* docs: susie inf method reloacated with the rest of the methods

* ci(release): add action to open pr that triggers release weekly (#474)

* chore: add action to open pr that triggers release weekly

* fix: make yamllint interpret on keyword as string

* fix: adapt time to gmt

* chore: update to run at 4pm

* fix: update github token variable (#476)

* ci: exclude changelog.md from precommit (#479)

---------

Co-authored-by: Yakov <yt4@sanger.ac.uk>
Co-authored-by: David Ochoa <ochoa@ebi.ac.uk>
Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>
Co-authored-by: David Ochoa <dogcaesar@gmail.com>
@xyg123 xyg123 deleted the xg1-finngen-finemapping-ingestion branch February 14, 2024 10:12
xyg123 added a commit that referenced this pull request Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants