grabix
leverages the fantastic BGZF library in samtools
to provide random access into
text files that have been compressed with bgzip
. grabix
creates it's own index (.gbi)
of the bgzipped file. Once indexed, one can extract arbitrary lines from the file with the grab
command. Or choose random lines with the, well, random
command.
There's a ton of room for improvement, but I needed something quickly in support of a side project.
Here's a brief example using the simrep.chr1.bed
file provided in the repository.
# 1. compress the file with bgzip
bgzip simrep.chr1.bed
# 2. create a grabix index of the file.
# creates simrep.chr1.bed.gbi
grabix index simrep.chr1.bed.gz
# 3. now, extract the 100th line in the file.
grabix grab simrep.chr1.bed.gz 100
chr1 401285 401444 trf 218
# 4. extract the 100th through 110th lines in the file.
grabix grab simrep.chr1.bed.gz 100 110
chr1 401285 401444 trf 218
chr1 401573 401748 trf 280
chr1 404661 404707 trf 92
chr1 406202 406274 trf 76
chr1 406227 406286 trf 77
chr1 406776 406819 trf 68
chr1 409821 409866 trf 51
chr1 409865 409900 trf 52
chr1 421245 421285 trf 64
chr1 422395 422435 trf 80
chr1 422560 422588 trf 56
You can also use grabix
to extract random lines from the file
# extract 10 randome lines from the file using reservoir sampling
grabix random simrep.chr1.bed.gz 10
Is a gzipped file bgzipped?
grabix check simrep.chr1.bed.gz