-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VariantRDD union creates multiple records for the same SNP ID #1644
Comments
We already split alternate alleles at the same position into multiple
|
+1 @heuermh, the union method specifies no contract WRT "duplicate" records. Logically, in variant space, "duplicate" is not clearly defined, as two records with the same chr/pos/ref/alt could have conflicting annotations. I don't believe that the VCF spec says that dupe records are illegal, as long as both records are well formed and the data is properly sorted. I suggest we close as |
I was thinking along the lines of the VCFTools implementation here, such that the output is a merged VCF. If we don't care about this use case, feel free to close the issue. |
union on GenotypeRDD/VariantRDD/VariantContextRDD implements both VCFTools merge and concat, albiet with possibly different semantics in the merge case. |
Closing as |
VariantRDD.union()
does not combine by SNP ID. This leads to the potential for multiple records with the same SNP ID.If we were to write out the
VariantRDD
as a VCF, it would not be correct.The text was updated successfully, but these errors were encountered: