Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to encode missing fields with number>1 #419

Open
yfarjoun opened this issue Jun 10, 2019 · 4 comments
Open

how to encode missing fields with number>1 #419

yfarjoun opened this issue Jun 10, 2019 · 4 comments
Labels

Comments

@yfarjoun
Copy link
Contributor

This can happen in either INFO or FORMAT when an array is missing. For example a missing PL in the format field in a diploid, biallelic site can be a . or .,.,. which one is correct? which one is valid? the text is somewhat vague and the example provided only covers the case of number=1.

@pd3
Copy link
Member

pd3 commented Jun 10, 2019

Both can be used. Single . for brevity or .,.,. to express ploidy.

@jmarshall jmarshall added the vcf label Jun 13, 2019
@lbergelson
Copy link
Member

It seems like . is probably usually preferable for non-GT missing fields.

Does htslib have reasonable support for dealing with partially missing arrays?

A related issue:
Currently htsjdk doesn't handle things non-genotypes with partially missing values well if at all.

At the moment things like AD = 5:.:10 are treated the same as . which seems wrong. One issue with representing these sorts of things correctly is that java primitive arrays don't support null values.

Does anyone know how partially missing arrays are handled in htlib?

@pd3
Copy link
Member

pd3 commented Jun 14, 2019

Partially missing arrays are fully handled in htslib. They were added later, and at that point the java implementation made the pragmatic decision to treat partially missing arrays as fully missing. (Which I understand because it can be quite a pain sometimes.)

@d-cameron
Copy link
Contributor

Specs-as-written, both are fine.

Section 1.6.2

If a field contains a list of missing values, it can be represented either as a single MISSING value (`.') or as a list of missing values (e.g.\ `.,.,.' if the field was Number=3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants