Skip to content

vcf sort

Rob Flickenger edited this page Aug 9, 2021 · 1 revision

The vdb vcf sort utility sorts VCF in a manner similar to vcftools vcf-sort. Note that the output of vdb study export is always sorted, while the output of vdb vcf export is identical to the original imported VCF file.

Input and output are assumed to be STDIN/STDOUT, so the command may be used as part of a pipeline. Data can also be directed from or to a file using --input / -i and --output / -o. A file selected with --input may be optionally gzip compressed, but VCF data sent to STDIN should be uncompressed first. Output data is always uncompressed, may be compressed using bgzip.

# compressed input, uncompressed output
$ biograph vdb vcf sort --input my.vcf.gz --output sorted.vcf

# streaming uncompressed input, compress with bgzip before writing
$ zcat my.vcf.gz | biograph vdb vcf sort | bgzip > sorted.vcf.gz

The default sort order uses alphabetic sort order, with chromosome names sorted as strings. Use the --chromosomal / -c option to use natural chromosomal ordering instead.

VCFs must be uncompressed for sorting, and may require significant temporary space for large files. Set $TMPDIR or use --tmp to choose a new temporary path (default: /tmp).

Getting more help

$ biograph vdb vcf sort --help
usage: biograph [-h] [-i INPUT] [-o OUTPUT] [-c] [-t TMP]

Sort a VCF file.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input VCF filename (/dev/stdin)
  -o OUTPUT, --output OUTPUT
                        Output VCF filename (/dev/stdout)
  -c, --chromosomal     Use natural order (1,2,3,10,22,X) instead of
                        alphabetic order (1,10,2,22,3,X)
  -t TMP, --tmp TMP     Temporary directory (/tmp)
Clone this wiki locally