bio

A lightweight and high-performance (see seqkit benchmark) bioinformatics package.

FASTA/Q parsing

This package has high performance close to the famous C lib kseq.h.

To test the performance, three datasets are used:

dataset_A, bacteria genomes, 2.7G
dataset_B, human genome, 2.9G
dataset_C, Illumina reads, 2.2G

Summary by seqkit:

file           seq_format   seq_type   num_seqs   min_len        avg_len       max_len
dataset_A.fa   FASTA        DNA          67,748        56       41,442.5     5,976,145
dataset_B.fa   FASTA        DNA             194       970   15,978,096.5   248,956,422
dataset_C.fq   FASTQ        DNA       9,186,045       100            100           100

seqtk (Version 1.1-r92-dirty, using kseq.h) and seqkit (Version v0.3.1.1, using this package) were used to test. Note that seqtk does not support wrapped (fixed line width) ouputing, so seqkit uses -w 0 to disable outputing wrapping. Script memusg is used to assess running time and peak memory usage.

Commands

Tests were repeated 5 times and average time and memory usage were computed.

Results:

Install

This package is "go-gettable", just:

go get -u github.com/shenwei356/bio

More

See the README of sub package.

Documentation

See documentation on godoc for more detail.

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
benchmark		benchmark
featio		featio
seq		seq
seqio		seqio
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bio

FASTA/Q parsing

Install

More

Documentation

About

Releases

Packages

Languages

License

michieldhadamus/bio

Folders and files

Latest commit

History

Repository files navigation

bio

FASTA/Q parsing

Install

More

Documentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages