-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tumor vs normal sample recognition in a vcf file #47
Comments
@ao508 @inodb @sheridancbio I would love to hear your thoughts before start implementing the solution. Would the change be beneficial for the upstream? |
@ruslan-forostianov hmm i haven’t seen that before, but seems like it could be a nice enhancement as an option to the cli e.g. |
@inodb I like the idea 👍 |
I’ve realised that the script receives option to specify input directory with vcfs, not a single vcf. Passing tumor/normal sample name with options does not make sense at this level.
After some additional research and discussion with @inodb, I decided to proceed with the original idea of using These variables are not part of the VCF specification but used widely by gatk. See https://github.com/broadinstitute/gatk/search?q=%23%23normal_sample%3D Here is the PR with an implementation: |
Currently,
standartize_mutation_data.py
while reading vcf file with 2 columns interprets first as a tumor sample and second as a normal sample.See https://github.com/genome-nexus/annotation-tools/blob/master/standardize_mutation_data.py#L854
We work with vcf files that don't have fixed order of the sample columns.
However, our vcf header contains metadata like the following:
Although, this metadata does not seem to be part of the vcf specification (https://samtools.github.io/hts-specs/VCFv4.1.pdf),
it seems to be used out there.
My proposal is to make
def get_vcf_sample_and_normal_ids(filename)
function to look fornormal_sample
andtumor_sample
in the header first. It'd fall back to the existing logic if such metadata has not been found.The text was updated successfully, but these errors were encountered: