Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: notebook to run qc metrics each release #541

Merged
merged 6 commits into from
Mar 20, 2024
Merged

feat: notebook to run qc metrics each release #541

merged 6 commits into from
Mar 20, 2024

Conversation

xyg123
Copy link
Contributor

@xyg123 xyg123 commented Mar 14, 2024

✨ Context

Need to get top level QC metrics each release.

🛠 What does this PR implement

Jupyter notebook addition to compute simple QC metrics for each release, to be run locally or on a VM with the full release dataset.

🙈 Missing

QC metrics that cross compares results from across releases are interesting, but not implemented yet.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added size-XL and removed size-L labels Mar 20, 2024
@addramir
Copy link
Contributor

addramir commented Mar 20, 2024

Thank you, looks good!

  1. The size of variant index is slightly different from what was reported by Daniel (5090991 vs 5468737). Any ideas why?
  2. There is None in "Average number of overlaps per CS: None".
  3. PICS vs Susie Finngen has a lot fo snps with pics_pip=1. No actions needed, just to note and to discuss later.
  4. Could you please add the correlation from PICS vs Susie Finngen pips?
  5. Please add your comment "There are definitely duplicated studylocusIDs in the l2g predictions, and still around 20% of studylocus contains more than 1 gene with score>0.5 . It is not possible to separate out these predictions based on whether they came from pics or susie, as the l2g outputs only contains the studylocusID (duplicated between pics and susie). If 20% is too high then it implies finngen pics and susie l2g are pointing (confidently) at different genes for the same studylocus." after you calculated it.

@xyg123
Copy link
Contributor Author

xyg123 commented Mar 20, 2024

  1. The size of variant index is slightly different from what was reported by Daniel (5090991 vs 5468737). Any ideas why?

There must be some sort of filter on the variant index, we see the same number of unique variants in the v2g table (5090991). It's not rsId/alleletype based filter, so maybe a MAF filter?

  1. There is None in "Average number of overlaps per CS: None".

Fixed

  1. PICS vs Susie Finngen has a lot fo snps with pics_pip=1. No actions needed, just to note and to discuss later.

Could be differences in clumping vs. not clumping.

  1. Could you please add the correlation from PICS vs Susie Finngen pips?

Added. (0.624)

  1. Please add your comment
    "There are definitely duplicated studylocusIDs in the l2g predictions, and still around 20% of studylocus contains more than 1 gene with score>0.5 . It is not possible to separate out these predictions based on whether they came from pics or susie, as the l2g outputs only contains the studylocusID (duplicated between pics and susie). If 20% is too high then it implies finngen pics and susie l2g are pointing (confidently) at different genes for the same studylocus." after you calculated it.

Added.

@xyg123 xyg123 marked this pull request as ready for review March 20, 2024 13:33
@xyg123 xyg123 requested a review from addramir March 20, 2024 13:49
@addramir addramir self-assigned this Mar 20, 2024
@xyg123 xyg123 merged commit 650bb2e into dev Mar 20, 2024
4 checks passed
@xyg123 xyg123 deleted the xg-qc-metrics branch March 20, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants