-
Notifications
You must be signed in to change notification settings - Fork 10
vcf sort
The vdb vcf sort
utility sorts VCF in a manner similar to vcftools vcf-sort. Note that the output of vdb study export
is always sorted, while the output of vdb vcf export
is identical to the original imported VCF file.
Input and output are assumed to be STDIN/STDOUT, so the command may be used as part of a pipeline. Data can also be directed from or to a file using --input
/ -i
and --output
/ -o
. A file selected with --input
may be optionally gzip compressed, but VCF data sent to STDIN should be uncompressed first. Output data is always uncompressed, may be compressed using bgzip.
# compressed input, uncompressed output
$ biograph vdb vcf sort --input my.vcf.gz --output sorted.vcf
# streaming uncompressed input, compress with bgzip before writing
$ zcat my.vcf.gz | biograph vdb vcf sort | bgzip > sorted.vcf.gz
The default sort order uses alphabetic sort order, with chromosome names sorted as strings. Use the --chromosomal
/ -c
option to use natural chromosomal ordering instead.
VCFs must be uncompressed for sorting, and may require significant temporary space for large files. Set $TMPDIR
or use --tmp
to choose a new temporary path (default: /tmp
).
$ biograph vdb vcf sort --help
usage: biograph [-h] [-i INPUT] [-o OUTPUT] [-c] [-t TMP]
Sort a VCF file.
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input VCF filename (/dev/stdin)
-o OUTPUT, --output OUTPUT
Output VCF filename (/dev/stdout)
-c, --chromosomal Use natural order (1,2,3,10,22,X) instead of
alphabetic order (1,10,2,22,3,X)
-t TMP, --tmp TMP Temporary directory (/tmp)