Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA12877 chrX calls #7

Open
ksw9 opened this issue Aug 3, 2018 · 8 comments
Open

NA12877 chrX calls #7

ksw9 opened this issue Aug 3, 2018 · 8 comments

Comments

@ksw9
Copy link

ksw9 commented Aug 3, 2018

Hi,

We are using the PlatinumGenomes NA12877 resource and are wondering why calls on chrX begin at position 2781986? The corresponding ConfidentRegions.bed.gz begins at: chrX 251053 251087.

Thank you for your help!

@blmoore
Copy link
Member

blmoore commented Aug 6, 2018

Confident regions need not contain a truth variant, they can also just be regions we're calling homozygous reference — does that answer your question?

@ksw9
Copy link
Author

ksw9 commented Aug 6, 2018 via email

@eberle
Copy link

eberle commented Aug 6, 2018

I think that what you are talking about is the PAR region. For variants, we "validate" the call based on the genotypes agreeing with the inheritance. Thus in males, the genotypes will end up being a combination of chrX & chrY and likely most variants will show up as heterozygous which will automatically fail them in our consistency check. I'm guessing that this is what you are seeing. I should point out that this can happen in other parts of the genome where there is a CNV - we can identify positions that are definitely reference but the variants may disagree with the pedigree check so we fail most of the variants. There is a discussion of this in the manuscript. What you should be seeing is that the confident region is not a 2.78Mb long region but a series of smaller blocks and many of these blocks are broken up where there are SNVs and indels. Does this agree with what you are seeing?

@erika8
Copy link

erika8 commented Oct 24, 2018

Hi,

We are also using the platinum genomes and noticed a big difference in the size of the confident regions of the X chromosome on hg38 (i.e. 2,477,045 bp) compared to hg 19 (i.e. 137,716,288 bp).
We also checked the data of Genome in a bottle (GIAB) and the size is in the same range for both builds: hg19: 137,156,694 bp; hg38: 109,267,367 bp.
Do you know why there is a such difference between both builds for platinum data? Chromosome X is quite important and the advantage of platinum genomes over GIAB is that we have two cell lines. Is there a way to solve this?
Thanks!

@eberle
Copy link

eberle commented Oct 24, 2018

Hi @erika8,

Thanks for using this resource. I think that I know what has happened. One of our requirements is that a "confident" region must be called across the pedigree and I males are more likely to not have a "PASS" genotype due to lower depth on chrX. I think that our callers with hg19 were sex-aware for calling the homozygous reference positions and thus the higher numbers. We are looking into this now to confirm what is happening and will work to fix this.

Cheers,

-Mike

@erika8
Copy link

erika8 commented Oct 25, 2018

Hi Mike,

Thanks for your feedback, I'm looking forward to the fix!

Cheers

Erika

@ksw9
Copy link
Author

ksw9 commented Nov 12, 2018

Hi,
Thanks for all your work on this and apologies for the late response to your above help. Yes, we are very interested in the differences between the chrX calls made to the two builds as well. Any ideas? Is there a loss of quality when using the calls made to hg19 which contain longer confident regions?

To clarify your above explanations - do you mean that the PAR regions can't contain high quality variants, but can contain high quality ref calls? So the confident regions there will contain long stretch of ref allele, interrupted where variants are called.

Thanks again for your help!
Best,

@ksw9
Copy link
Author

ksw9 commented Nov 19, 2018

Hi,
I just wanted to follow up - are the chrX hg19 truth VCF calls reliable?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants