Skip to content

Commit

Permalink
Increase the input block size for bgzip.
Browse files Browse the repository at this point in the history
Commit e495718 changed bgzip from unix raw POSIX read() calls to
hread().  Unfortunately hread gets its buffer size from stat of the
input file descriptor, which can be 4kb for a pipe.  We're reading
0xff00 bytes, so this ends up being split over two reads mostly, with
one or both involving additional memcpys.  This makes the buffered I/O
worse performing than non-buffered.  In the most extreme cases (cat
data | bgzip -l0 > /dev/null) this is a two fold slow down.

The easy solution is just to increase the buffer size to something
sensible.  It's a little messy as we have to use hfile_internal.h to
get hfile_set_blksize, but it works.  I'm not sure why we didn't elect
to make that API more public.  Probably simply out of caution.

Fixes samtools#1767
  • Loading branch information
jkbonfield committed Oct 30, 2024
1 parent cf0e756 commit fdd6143
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions bgzip.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
#include "htslib/bgzf.h"
#include "htslib/hts.h"
#include "htslib/hfile.h"
#include "hfile_internal.h" // for hfile_set_blksize

#ifdef _WIN32
# define WIN32_LEAN_AND_MEAN
Expand Down Expand Up @@ -337,6 +338,9 @@ int main(int argc, char **argv)
return 1;
}

// Increase block size to reduce performance bottlenecks on pipes
hfile_set_blksize(f_src, 128*1024);

if (write_fname) {
if (!exp_out_open) { // only open this file once for writing, close at the end
if ((fp = bgzf_open(write_fname, out_mode)) == NULL) {
Expand Down

0 comments on commit fdd6143

Please sign in to comment.