Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF v4.3 ##META header line syntax #558

Open
daviesrob opened this issue Apr 27, 2021 · 5 comments
Open

VCF v4.3 ##META header line syntax #558

daviesrob opened this issue Apr 27, 2021 · 5 comments
Assignees
Labels

Comments

@daviesrob
Copy link
Member

VCF v4.3 section 1.4.8 defines ##META lines. What they are for isn't very well described, but they appear to allow allow the declaration of sample attribute types along with a dictionary of allowed values. To specify the dictionary it uses a square-bracketed list syntax that is unique to this line type.

Currently HTSlib reports an error if it sees a line using this syntax. Htsjdk attempts to read them, but drops all of the values apart from the last one, and includes the trailing square bracket. Currently PR samtools/htslib#1240 exists to add support for this to HTSlib, although it needs a bit of work to round-trip correctly. htsjdk had PR samtools/htsjdk#835 which didn't get merged although recent comments suggest it may be resurrected.

As I can't find many complaints about them not working, I suspect this header type is not used much if at all. Given this, is it something that could be safely dropped from VCF 4.4; or at least modified to use a normal quoted string instead of the bracketed format? If kept, I think it would need an improved description of exactly how these header lines are intended to work and interact with ##SAMPLE lines.

@daviesrob daviesrob added the vcf label Apr 27, 2021
@jkbonfield
Copy link
Contributor

I don't see ##META mentioned anywhere in the spec other than that simple example (and the history section). It is logical to conclude it's for defining an ontology, where the META lines list the valid terms that may be permitted, but if so it needs stating explicitly.

Are they only permitted in the ##SAMPLE lines, or can that ontology be used elsewhere? It's all incredibly terse!

What does the ? term mean. Is that a literal ? or is it some symbolic field to indicate the ontology is not complete and other values are permitted? Is that even legal? It ought to state it is not, if so.

@jmarshall
Copy link
Member

We also have several previous issues filed by people trying to figure out how to use ##META lines: #106, #317, #350, #351.

@jkbonfield
Copy link
Contributor

One suggestion from the meeting (Rob?) was cull it from VCF 4.4 until such a time as we figure out what we want it to be and document it properly. I'm with that. It can always be added back, but including an underspecified non-functioning bit in there isn't good.

@tcezard
Copy link
Contributor

tcezard commented Apr 29, 2021

Now that it's been merged we have example files that somewhat document the usage and should be looked at and corrected when this is reviewed.
test/vcf/4.3/passed/passed_meta_meta.vcf
test/vcf/4.3/failed/failed_meta_meta_000.vcf
test/vcf/4.3/failed/failed_meta_meta_001.vcf
test/vcf/4.3/failed/failed_meta_meta_002.vcf
test/vcf/4.3/failed/failed_meta_meta_003.vcf

@jmarshall
Copy link
Member

See also #88 (comment) and the following comment thread, which may have been the genesis of pedigree-related use of ##META.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Stalled
Development

No branches or pull requests

5 participants