Skip to content

1.16

Compare
Choose a tag to compare
@daviesrob daviesrob released this 18 Aug 14:11
· 370 commits to develop since this release
1.16

Download the source code here: htslib-1.16.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they are missing some generated files.)

  • Make hfile_s3 refresh AWS credentials on expiry in order to make HTSlib work better with AWS IAM credentials, which have a limited lifespan. (PR #1462 and PR #1474, addresses #344)

  • Allow BAM headers between 2GB and 4GB in size once more. This is not permitted in the BAM specification but was allowed in an earlier version of HTSlib. There is now a warning at 2GB and a hard failure at 4GB. (PR #1421, fixes #1420 and samtools/samtools#1613. Reported by John Marshall and R C Mueller)

  • Improve error message when failing to load an index. (PR #1468, example of the problem samtools/samtools#1637)

  • Permit MM (base modification) tags containing . and ? suffixes. These define implicit vs explicit coordinates. See the SAM tags specification for details. (PR #1423 and PR #1426, fixes #1418. PR #1469, fixes #1466. Reported by cjw85)

  • Warn if spaces instead of tabs are detected in a VCF file to prevent confusion. (PR #1328, fixes bcftools#1575. Reported by ketkijoshi278)

  • Add an sclen filter expression keyword. This is the length of a soft-clip, both left and right end. It may be combined with qlen (qlen-sclen) to obtain the number of bases in the query sequence that have been aligned to the genome ie it provides a way to compare local-alignment vs global-alignment length. (PR #1441 and PR samtools/samtools#1661, fixes #1436. Requested by Chang Y)

  • Improve error messages for CRAM reference mismatches. If the user specifies the wrong reference, the CRAM slice header MD5sum checks fail. We now report the SQ line M5 string too so it is possible to validate against the whole chr in the ref.fa file. The error message has also been improved to report the reference name instead of #num. Finally, we now hint at the likely cause, which counters the misleading samtools supplied error of "truncated or corrupt" file. (PR #1427, fixes samtools/samtools#1640. Reported by Jian-Guo Zhou)

  • Expose more of the CRAM API and add new functionality to extract the reference from a CRAM file. (PR #1429 and PR #1442)

  • Improvements to the implementation of embedded references in CRAM where no external reference is specified. (PR #1449, addresses some of the issues in #1445)

  • The CRAM writer now allows alignment records with RG:Z: aux tags that don't have a corresponding @RG ID in the file header. Previously these tags would have been silently dropped. HTSlib will complain whenever it has to add one though, as such tags do not conform to recommended practice for the SAM, BAM and CRAM formats. (PR #1480, fixes #1479. Reported by Alex Leonard)

  • Set tab delimiter in man page for tabix GFF3 sort. (PR #1457. Thanks to Colin Diesh)

  • When using libdeflate, the 1...9 scale of BGZF compression levels is now remapped to the 1...12 range used by libdeflate instead of being passed directly. In particular, HTSlib levels 8 and 9 now map to libdeflate levels 10 and 12, so it is possible to select the highest (but slowest) compression offered by libdeflate. (PR #1488, fixes #1477. Reported by Gert Hulselmans)

  • The VCF variant API has been extended so that it can return separate flags for INS and DEL variants as well as the existing INDEL one. These flags have not been added to the old bcf_get_variant_types() interface as it could break existing users. To access them, it is necessary to use new functions bcf_has_variant_type() and bcf_has_variant_types(). (PR #1467)

  • The missing, but trivial, le_to_u8() function has been added to hts_endian. (PR #1494, Thanks to John Marshall)

  • bcf_format_gt() now works properly on big-endian platforms. (PR #1495, Thanks to John Marshall)

Build changes

These are compiler, configuration and makefile based changes.

  • Update htscodecs to version 1.3.0 for new SIMD code + various fixes. Updates the htscodecs submodule and adds changes necessary to make HTSlib build the new SIMD codec implementations. (PR #1438, PR #1489, PR #1500)

  • Fix clang builds under mingw. Under mingw, clang requires dllexport to be applied to both function declarations and function definitions. (PR #1435, PR #1497, PR #1498 fixes #1433. Reported by teepean)

  • Fix curl type warning with gcc 12.1 on Windows. (PR #1443)

  • Detect ARM Neon support and only build appropriate SIMD object files. (PR #1451, fixes #1450. Thanks to John Marshall)

  • make print-config now reports extra CFLAGS that are needed to build the SIMD parts of htscodecs. These may be of use to third-party build systems that don't use HTSlib's or htscodecs' build infrastructure. (PR #1485. Thanks to John Marshall)

  • Fixed some Makefile dependency issues for the check/test targets and plugins. In particular, make check will now build the all target, if not done already, before running the tests. (PR #1496)

Bug fixes

  • Fix bug when reading position -1 in BCF (0 in VCF), which is used to indicate telomeric regions. The BCF reader was incorrectly assuming the value stored in the file was unsigned, so a VCF->BCF->VCF round-trip would change it from 0 to 4294967296. (PR #1476, fixes #1475 and bcftools#1753. Reported by Rodrigo Martin)

  • Various bugs and quirks have been fixed in the filter expression engine, mostly related to the handling of absent tags, and the is_true flag. Note that as a result of these fixes, some filter expressions may give different results:

    • Fixed and-expressions including aux tag values which could give an invalid true result depending on the order of terms.
    • The expression ![NM] is now true if only NM does not exist. In earlier versions it would also report true for tags like NM:i:0 which exist but have a value of zero.
    • The expression [X1] != 0 is now false when X1 does not exist. Earlier versions would return true for this comparison when the tag was missing.
    • NULL values due to missing tags now propagate through string, bitwise and mathematical operations. Logical operations always treat them as false. (PR #1463, fixes samtools/samtools#1670. Reported by Gert Hulselmans; PR #1478, fixes samtools/samtools#1677. Reported by johnsonzcode)
  • Fix buffer overrun in bam_plp_insertion_mod. Memory now grows to the proper size needed for base modification data. (PR #1430, fixes samtools/samtools#1652. Reported by hd2326)

  • Remove limit of returned size from fai_retrieve(). (PR #1446, fixes samtools/samtools#1660. Reported by Shane McCarthy)

  • Cap hts_getline() return value at INT_MAX. Prevents hts_getline() from returning a negative number (a fail) for very long string length values. (PR #1448. Thanks to John Marshall)

  • Fix breakend detection and test bcf_set_variant_type(). (PR #1456, fixes #1455. Thanks to Martin Pollard)

  • Prevent arrays of BCF_BT_NULL values found in BCF files from causing bcf_fmt_array() to call exit() as the type is unsupported. These are now tested for and caught by bcf_record_check(), which returns an error code instead. (PR #1486)

  • Improved detection of fasta and fastq files that have very long comments following identifiers. (PR #1491, thanks to John Marshall. Fixes samtools/samtools#1689, reported by cjw85)

  • Fixed a SEGV triggered by giving a SAM file to samtools import. (PR #1492)