Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding GWAS Catalog study curation #17

Merged
merged 5 commits into from
Jan 8, 2024

Conversation

DSuveges
Copy link
Contributor

@DSuveges DSuveges commented Dec 8, 2023

As described in issue #3132, the manually curated GWAS Catalog studies will be stored in the curation repo. As it is now, the code responsible for annotating gwas catalog studies based on curation can take this table as it is. Even if the URL to the file in the repo is provided.

@DSuveges DSuveges linked an issue Dec 8, 2023 that may be closed by this pull request
@DSuveges DSuveges marked this pull request as ready for review December 15, 2023 11:30
Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding and documenting the file! I envision having L2G's curation here as well.
I have comments specifically about the column names, let me know your thoughts.

README.md Show resolved Hide resolved
docs/genetics.md Outdated
### Schema

- **studyId** - GCST study accession to identify study
- **analysisFlag** - comment on the applied statistical method authors used that might have downstream implication in our pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name in the data is upateAnalysisFlags.
Looking at the possible categories ("Case-case study", "GxG", "Multivariate analysis", "GxE"), I'd rename this field to studySubType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although these labels might of fall into a "subtype" category (however I'm 100% agree), there's no such requirement that this field would only contain such.

docs/genetics.md Outdated
### Schema

- **studyId** - GCST study accession to identify study
- **analysisFlag** - comment on the applied statistical method authors used that might have downstream implication in our pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **analysisFlag** - comment on the applied statistical method authors used that might have downstream implication in our pipelines.
- **studySubType** - description of the specific statistical methodology employed in the GWAS.

docs/genetics.md Outdated

- **studyId** - GCST study accession to identify study
- **analysisFlag** - comment on the applied statistical method authors used that might have downstream implication in our pipelines.
- **updateStudyType** - if a study is not really a GWAS, but a qtl. This string will be picked up and replace the `type` value in the study index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the comment of how we use this field for the pipeline documentation. Therefore, I suggest naming it to studyType

Suggested change
- **updateStudyType** - if a study is not really a GWAS, but a qtl. This string will be picked up and replace the `type` value in the study index.
- **studyType** - categorises the study as either GWAS or molQTL.

docs/genetics.md Outdated
- **studyId** - GCST study accession to identify study
- **analysisFlag** - comment on the applied statistical method authors used that might have downstream implication in our pipelines.
- **updateStudyType** - if a study is not really a GWAS, but a qtl. This string will be picked up and replace the `type` value in the study index.
- **qualityControls** - `|` separated list of identified issues that prevent the study from ingestion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name in the data is upateQualityControls.
Because of the same reason as above, I'd rename it to qualityControls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code that processes these columns there's a join that updates similarly named fields. It would make that code more complex. However at some point we come back to this.

@DSuveges
Copy link
Contributor Author

DSuveges commented Jan 2, 2024

Thank you for adding and documenting the file! I envision having L2G's curation here as well.

Yes, that is certainly a desired path, also we should collate UKBB, FINNGEN trait curation here. Not sure if we should unify diease/trait mappings via ontoma though. There are arguments in both directions.

@DSuveges DSuveges requested a review from ireneisdoomed January 2, 2024 14:06
Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having "updated" in the studyType and QualityControls headers makes it look like these columns are relative to the columns in another file. I think this curation can be used as a standalone file, not just in conjunction with the GWAS Catalog study index. So I think it's cleaner to move the logic into the genetics ETL when we use this file in a specific context.

This is my view, not a blocker if you think this solution is better. The only thing that I do think we need to fix is the typos: updated instead of upate.

@DSuveges
Copy link
Contributor Author

DSuveges commented Jan 3, 2024

I have removed the update prefix from the column names.

@DSuveges DSuveges requested a review from ireneisdoomed January 3, 2024 13:27
Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes :)

@DSuveges DSuveges merged commit 23dc493 into master Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Managing GWAS Catalog study QC/flags
2 participants