-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some rsIDs in original VCF not in imputed VCF #51
Comments
We can only impute at sites shared with the reference panel. The 2k rsIDs you refer to must not be in the reference panel you are using - you can easily check that by comparing the lists of sites. Indeed, pbwt tells you how many sites are being used I believe. |
Thanks for this info. This was the explanation that I was guessing. However, these SNPs missing from the imputation result are in my data, they don't need to be imputed, so it's strange that they don't find their way into the result don't you think? These are the 'anchors' that the other data is derived from, so dropping any of them is a problem (I'm guessing). Obviously I can merge the files, but that's a bit of a pain. Interestingly, I'm processing the 23andMe (v3) file along with the imputation results, and I find that there are 1,806 rsIDs that can be added back to the imputation results by matching chromosome and position. e.g. they are in the imputation panel after all, but they don't have the correct rsID or the rsID is somehow dropped at one stage or another. So of the 2,479 'missing' SNPs, 1,806 can be 'found' leaving 673 'anchors' missing from chromosome 3. On a related note, I see some rsIDs with multiple positions in the results (this is different from the variations with > 1 alt allele we discussed elsewhere). e.g.
When I check that rsID, I can see that it was merged with two other rsIDs at these locations: Which sort of makes sense, sort of not... from somewhere the two 'new' locations for this rsID have been found (4942430 and 4942432), but the new rsIDs have not. i.e. it's the same old rsID with the new locations. I find about 30 of these cases in chromosome 3. Sorry if this is overly pedantic... I'm honestly not sure how else to work! Many thanks, |
So I just checked and all but 2 of the 31 rsIDs in the imputation results for chromosome 3 have been merged into two separate rsIDs, and all the distances between them are less than 10bp. Although this isn't a big problem it's still confusing regarding new position / old rsID. I guess this may be a bug / version mix up somewhere? The two rsIDs with two positions and no apparent cause in dbSNP are: Here is how they appear in the file:
I assume this is a version inconsistency with the data somewhere. |
More possibly related 'weirdness':
|
And just for completeness...
|
@richarddurbin Sorry, I know the above details are a pain to 'parse', but hopefully the problems are clear enough. Please let me know if any of the above issues are unclear. Here is the python code that I used to pull everything out: Please let me know if I should log these problems elsewhere. I have some questions about phasing / imputation in general, I wonder if you or one of your colleagues could spare some time to talk me through some details? Many thanks, |
Looking only at chromosome 3, some 2k rsIDs (from 23andMe data) are not found in the 2.8G SNPs in imputed chromosome 3... Why should SNPs be dropped from the input in the output?
Many thanks.
The text was updated successfully, but these errors were encountered: