Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(studyLocus validation): adding validation logic to studyLocus dataset #686

Merged
merged 11 commits into from
Jul 12, 2024

Conversation

DSuveges
Copy link
Contributor

🛠 What does this PR implement

StudyLocus dataset now can be validated based on the p-value of the lead variant or the study quality flags of the study the locus was discovered in. As related effort the p-value quality control logic was moved from gwas catalog ingestion module and was moved to the study-locus dataset and was wrapped as a stand-alone validation step.

🙈 Missing

Important: This is not a step! This is pure business logic. Writing the orchestration is a separate effort.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@DSuveges
Copy link
Contributor Author

Usage:

study_index.df.show(truncate=False)
study_locus.df.show(truncate=False)
(
    study_locus
    .validate_lead_pvalue(1e-4)
    .validate_study(study_index)
    .df
    .show(truncate=False)
)

Output:

+-------+---------+---------+---------------+
|studyId|projectId|studyType|qualityControls|
+-------+---------+---------+---------------+
|s1     |p1       |gwas     |[]             |
|s2     |p1       |gwas     |[some_flag]    |
|s3     |p1       |gwas     |[some_flag]    |
+-------+---------+---------+---------------+

+------------+---------+-------+--------------+--------------+---------------+
|studyLocusId|variantId|studyId|pValueMantissa|pValueExponent|qualityControls|
+------------+---------+-------+--------------+--------------+---------------+
|1           |v1       |s1     |1.0           |-8            |[]             |
|3           |v2       |s2     |1.0           |-3            |[]             |
|2           |v2       |s3     |1.0           |-5            |[]             |
|2           |v2       |s4     |1.0           |-4            |[]             |
+------------+---------+-------+--------------+--------------+---------------+

+------------+---------+-------+--------------+--------------+-----------------------------------------------------------+
|studyLocusId|variantId|studyId|pValueMantissa|pValueExponent|qualityControls                                            |
+------------+---------+-------+--------------+--------------+-----------------------------------------------------------+
|1           |v1       |s1     |1.0           |-8            |[]                                                         |
|3           |v2       |s2     |1.0           |-3            |[Subsignificant p-value, Study has failed quality controls]|
|2           |v2       |s3     |1.0           |-5            |[Study has failed quality controls]                        |
|2           |v2       |s4     |1.0           |-4            |[Study not found in the study index]                       |
+------------+---------+-------+--------------+--------------+-----------------------------------------------------------+

Copy link
Collaborator

@d0choa d0choa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we start putting these functions into use, we might require minor tweaks. No point on trying to anticipate any of that.

src/gentropy/dataset/study_locus.py Outdated Show resolved Hide resolved
@DSuveges DSuveges merged commit 16f3d71 into dev Jul 12, 2024
4 checks passed
@DSuveges DSuveges deleted the ds_study_locus_validation branch July 12, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants