feat: extract credible sets and studies from eQTL Catalogue finemapping results #514

ireneisdoomed · 2024-03-01T15:07:08Z

This PR includes:

New DAG that generates credible_set and study_index datasets based on the eQTL Catalogue finemapping results for all eQTLs that significantly influences gene expression.
The process takes around 40 minutes and it introduces a novel approach to preprocess input files: instead of handling the compressed TSVs with Spark, we use Dataflow to decompress and store in a temporary bucket so that the ingestion is parallelised.

Metrics

Nr of studies: 317 911
Nr of credible sets: 385 100
Stats about the size of each credible set:

+-------+------------------+                                                    
|summary|       credSetSize|
+-------+------------------+
|  count|            385100|
|   mean|31.648021293170604|
| stddev|  99.3423119691853|
|    min|                 1|
|    25%|                 3|
|    50%|                11|
|    75%|                32|
|    max|              4090|
+-------+------------------+

Stats about the number of credible sets per study:

+----------------+------+                                                       
|nCredSetPerStudy| count|
+----------------+------+
|              10|     2|
|               9|     3|
|               8|     6|
|               7|    25|
|               6|    71|
|               5|   313|
|               4|  1232|
|               3|  7555|
|               2| 46542|
|               1|262162|
+----------------+------+

Note
This PR only processes results where the quantification method is ge.
Based on a discussion with @DSuveges and @d0choa, we will bring credible sets from all methods. I suggest that these changes happen in a subsequent PR, to checkpoint the work achieved here in case we want to revert.

coauthored with @d0choa

…FromSource` nullable

… into il-eqtl-susie

…-eqtl-susie

… into il-eqtl-susie

d0choa

Wonderful!

src/gentropy/datasource/eqtl_catalogue/study_index.py

src/gentropy/config.py

src/gentropy/datasource/eqtl_catalogue/study_index.py

d0choa · 2024-03-01T17:11:02Z

src/gentropy/datasource/eqtl_catalogue/study_index.py

+            StructField("sample_group", StringType(), True),
+            StructField("tissue_id", StringType(), True),
+            StructField("tissue_label", StringType(), True),
+            StructField("condition_label", StringType(), True),


to be added to the study index

Yakov mentioned that carrying it over in the study index should be enough.

src/gentropy/datasource/eqtl_catalogue/study_index.py

src/gentropy/eqtl_catalogue.py

d0choa and others added 22 commits February 21, 2024 11:30

feat: dataflow decompress prototype (#501)

d39e931

chore: commit susie results gist

ce1c38c

feat(study_index): add tissueFromSourceId to schema and make `trait…

71f4b8a

…FromSource` nullable

Merge branch 'il-eqtl-susie' of https://github.com/opentargets/gentropy…

26c6610

… into il-eqtl-susie

fix: bug and linting fixes in new eqtl ingestion step

2e838ee

perf: config bugfixes and performance improvements

3d78779

Merge branch 'dev' of https://github.com/opentargets/gentropy into il…

b03d113

…-eqtl-susie

perf: remove data persistance to avoid executor failure

9666a4a

perf: load susie results for studies of interest only

d438dd2

perf: collect locus for leads only and optimise partitioning cols

70ff79b

feat: parametrise methods to include

8e294a1

feat: run full dag

01fa5d4

test: add tests

1e2dd2d

Merge branch 'dev' into il-eqtl-susie

de52a3f

fix: reorder test inputs

d137502

docs: update eqtl catalogue docs

419b41e

Merge branch 'il-eqtl-susie' of https://github.com/opentargets/gentropy…

9fb63ed

… into il-eqtl-susie

fix: correct typos in tests docstrings

ab7d01b

fix: correct typos in tests docstrings

42bf5a4

test: fix

10afb92

revert: revert unwanted change in studyId definition

9330ed5

test: final fix

153f369

ireneisdoomed requested a review from d0choa March 1, 2024 16:44

d0choa approved these changes Mar 1, 2024

View reviewed changes

Merge branch 'dev' into il-eqtl-susie

18be1f5

ireneisdoomed merged commit ec9d2c7 into dev Mar 4, 2024
3 checks passed

ireneisdoomed deleted the il-eqtl-susie branch July 15, 2024 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extract credible sets and studies from eQTL Catalogue finemapping results #514

feat: extract credible sets and studies from eQTL Catalogue finemapping results #514

ireneisdoomed commented Mar 1, 2024

d0choa left a comment

d0choa Mar 1, 2024

ireneisdoomed Mar 4, 2024

feat: extract credible sets and studies from eQTL Catalogue finemapping results #514

feat: extract credible sets and studies from eQTL Catalogue finemapping results #514

Conversation

ireneisdoomed commented Mar 1, 2024

d0choa left a comment

Choose a reason for hiding this comment

d0choa Mar 1, 2024

Choose a reason for hiding this comment

ireneisdoomed Mar 4, 2024

Choose a reason for hiding this comment