-
Notifications
You must be signed in to change notification settings - Fork 173
floats vs doubles in VCF BCF
pd3 edited this page Jan 9, 2014
·
1 revision
While the VCF file format does not specify the range of numeric types, BCF format allows maximum 32-bit integers and 32-bit floats. The general consensus is that 32-bit integers are sufficient but opinions about floats vary. The opinions expressed on the vcftools-spec mailing list were:
- 32-bit floats are sufficient, we can use log() when extended range is needed
- 32-bit floats are not sufficient, we must be able to express extended range explicitly (however, precision is sufficient)
- 32-bit floats are not sufficient, we need both extended range and precision
The following solutions have been proposed:
- restrict numeric types in VCF to 32-bits and recommend using log() when extended range is needed.
- dynamically detect the range and use doubles in BCF when necessary, similarly to how integer types are handled.
- introduce a new type 'double' to VCF and BCF
The following pros and cons have been mentioned:
- The need for higher precision seems only theoretical at this point, VCF producers such as GATK and samtools are happy with floats, no specific example was given to demonstrate the need for higher precision and extended range can be achieved using log().
- Allowing extended precision leads to a new sort of problems when converting from BCF to VCF: how many significant digits to output?