Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beneficial to merge all processed vcfs into one prior to annotating? #2

Open
thomasyu888 opened this issue Apr 27, 2020 · 3 comments
Open

Comments

@thomasyu888
Copy link

average distribution of variants of VCFs:

wc -l vcf/processed/*
2 GENIE-
2 GENIE-
5 GENIE-
1 GENIE-
2 GENIE-
3 GENIE-
2 GENIE-
...
@thomasyu888
Copy link
Author

thomasyu888 commented Apr 27, 2020

Angelica informed me that there isn't really much difference in speed and that the first time running through the annotation will be longer due to cacheing of the results in mongoDB.

I experienced - around 400 VCFs an hour first time running through the site. (Granted the process probably sped up if there are duplicated variants)

@sheridancbio
Copy link
Contributor

I think it would be a good idea to do some real performance comparisons ... seeing how long a run for a center using hundreds of small vcf files take to complete. And compare that to an annotation run for the same center using a single merged maf (perhaps from the output of the first run) as the input.

@ao508
Copy link
Contributor

ao508 commented Apr 28, 2020

Based on a previous conversation with @thomasyu888 the original decision to annotate the MAFs individually first before generating a "per-center" MAF was based on whether the suggested approach would be scalable or not. I believe it's been decided to do some performance tests first based on the currently available GENIE data. Performance tests will be:

  • total time to standardize and annotate individual MAFs from a center
  • total time to standardize and merge individual MAFs into a "per-center" MAF and then run that through the annotator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants