You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm creating tabix indexes for arbitrarily formatted GWAS summary stats. This works fine, but the header seems to be lost. If I pass --skip-lines, then when I query the header, I get a blank line. If I don't pass --skip-lines, then I get [E::get_intv] Failed to parse TBX_GENERIC, was wrong -p [type] used?, which I think makes sense since because the first row is not numeric (since it is the header).
Does retaining the header effectively only work if it is prefixed with #? If so, should that be mentioned in the documentation?
Apologies if this is documented but I somehow missed it.
$ tabix -h
Version: 1.10.2
Usage: tabix [OPTIONS] [FILE] [REGION [...]]
Indexing Options:
-0, --zero-based coordinates are zero-based
-b, --begin INT column number for region start [4]
-c, --comment CHAR skip comment lines starting with CHAR [null]
-C, --csi generate CSI index for VCF (default is TBI)
-e, --end INT column number for region end (if no end, set INT to -b) [5]
-f, --force overwrite existing index without asking
-m, --min-shift INT set minimal interval size for CSI indices to 2^INT [14]
-p, --preset STR gff, bed, sam, vcf
-s, --sequence INT column number for sequence names (suppressed by -p) [1]
-S, --skip-lines INT skip first INT lines [0]
Querying and other options:
-h, --print-header print also the header lines
-H, --only-header print only the header lines
-l, --list-chroms list chromosome names
-r, --reheader FILE replace the header with the content of FILE
-R, --regions FILE restrict to regions listed in the file
-T, --targets FILE similar to -R but streams rather than index-jumps
-D do not download the index file
The text was updated successfully, but these errors were encountered:
You're right that it's not very well documented, but the header options only use the stored -c value when working out which lines to print. This is fine for most of the formats that tabix was originally designed for like SAM, VCF and BED, as they all have prefixes on header lines.
Unfortunately I don't think tabix stores the number of lines that were skipped in the index, although it might be possible to work it out by finding the first indexed line. So it may be possible to add a way to output only the unindexed data at the start of the file (I'm not sure it would be good to use --print-header for this as we don't know if data skipped using -S was due to it being header lines or for some other reason).
I'm getting the same issue/behavior as @carbocation. @daviesrob it would be great if there was an additional option that allowed users to specify a specific line/lines as header lines when indexing, and would show up when querying with --print-header.
I'm creating tabix indexes for arbitrarily formatted GWAS summary stats. This works fine, but the header seems to be lost. If I pass
--skip-lines
, then when I query the header, I get a blank line. If I don't pass--skip-lines
, then I get[E::get_intv] Failed to parse TBX_GENERIC, was wrong -p [type] used?
, which I think makes sense since because the first row is not numeric (since it is the header).Does retaining the header effectively only work if it is prefixed with
#
? If so, should that be mentioned in the documentation?Apologies if this is documented but I somehow missed it.
The text was updated successfully, but these errors were encountered: