Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for genotype data in VCF format #45

Open
nevrome opened this issue Oct 8, 2021 · 1 comment
Open

Support for genotype data in VCF format #45

nevrome opened this issue Oct 8, 2021 · 1 comment

Comments

@nevrome
Copy link
Member

nevrome commented Oct 8, 2021

The VCF file format appears to be a popular, powerful and (comparatively) well specified file format for genotype data. Poseidon could (one day!) support it the same way it supports Packed PLINK and EIGENSTRAT data. Some observations:

  • VCF files seem to be very flexible and capable of storing a lot more information than PLINK or EIGENSTRAT files. That makes them harder to parse and render. Most importantly there is no lossless conversion between the formats, given VCF's greater flexibility.
  • The VCF file definition seems to be adjusted relatively frequently. v.4.3 is published, v.4.4 on the way. For Poseidon we would have to decide which version we support and keep track of the changes in the format.
  • For poseidon-hs: sequence-formats already supports it (at least partially?). In case of missing functionality here, also this script or this package may serve as an inspiration.
@stschiff
Copy link
Member

Yes, VCF would have the sweet advantage of also encoding things like genotype likelihoods and read counts. Indeed! Wouldn't be a big problem to feature that, although forge would then have to make some choices, since VCF is a bit more general than Eigenstrat and Plink. Would have to be prepared to get feature requests then... but why not! I'll note it down as a nice addition. I already have plans to output read counts with pileupCaller, so might just actually output VCF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants