Skip to content

RSBio-Seq is a python wrapper for rust bio crate to provide fast sequence reading.

License

Notifications You must be signed in to change notification settings

anuradhawick/rsbio-seq

Repository files navigation

RSBio-Seq

Cargo tests Downloads PyPI - Version Upload to PyPI License: GPL v3

██████  ███████ ██████  ██  ██████        ███████ ███████  ██████  
██   ██ ██      ██   ██ ██ ██    ██       ██      ██      ██    ██ 
██████  ███████ ██████  ██ ██    ██ █████ ███████ █████   ██    ██ 
██   ██      ██ ██   ██ ██ ██    ██            ██ ██      ██ ▄▄ ██ 
██   ██ ███████ ██████  ██  ██████        ███████ ███████  ██████  
                                                              ▀▀   

RSBio-Seq intends to provide reading/writing facility on common sequence formats (FASTA/FASTQ) in both raw (fasta, fa, fna, fastq, fq) and compressed formats (.gz).

Installation

1. From PyPI (Recommended)

Use the following command to install from PyPI.

pip install rsbio-seq

2. Build and install from source

To build from source, make sure you have the following programs installed.

To build and install the development version of the wheel.

maturin develop # this installs the development version in the env
maturin develop --rust # this installs a release version in the env

To build a release mode wheel for installation, use this command.

maturin build --release

You will find the whl file inside the target/wheels directory. Your whl file will have a name depicting your python environment and CPU architecture. The built wheel can be installed using this command.

pip install target/wheels/*.whl

Usage

Once installed you can import the library and use as follows.

Reading

from rsbio_seq import SeqReader, Sequence, ascii_to_phred

# each seq entry is of type Sequence
seq: Sequence

for seq in SeqReader("path/to/seq.fasta.gz"):
    print(seq.id)
    print(seq.seq)
    # for fastq quality line
    print(seq.qual) # prints IIII
    print(ascii_to_phred(seq.qual)) # prints [40, 40, 40, 40]
    # optional description attribute
    print(seq.desc)

Writing

from rsbio_seq import SeqWriter, Sequence, phred_to_ascii

# writing fasta
seq = Sequence("id", "desc", "ACGT") # id, description, sequence
writer = SeqWriter("out.fasta")
writer.write(seq)
writer.close()

# writing fastq
seq = Sequence("id", "desc", "ACGT", "IIII") # id, description, sequence, quality
writer = SeqWriter("out.fastq")
writer.write(seq)
writer.close()

# writing gzipped
seq = Sequence("id", "desc", "ACGT", "IIII") # id, description, sequence, quality
writer = SeqWriter("out.fq.gz")
writer.write(seq)
writer.close()

# writing gzipped with phred score translation
qual = phred_to_ascii([40, 40, 40, 40])
seq = Sequence("id", "desc", "ACGT", qual) # id, description, sequence, quality
writer = SeqWriter("out.fq.gz")
writer.write(seq)
writer.close()

Note: close() is only required if you want to read the file again in the same function/code scope. Closing opened files is a good practice either way.

We provide two utility functions for your convenience.

  • phred_to_ascii - convert phred scores list of numbers to a string
  • ascii_to_phred - convert the quality string to a list of numbers

RSBio-Seq reads and write quality string in ascii format only. Please use these helper functions to translate if you intend to read them.

Authors

Support and contributions

Please get in touch via author websites or GitHub issues. Thanks!

About

RSBio-Seq is a python wrapper for rust bio crate to provide fast sequence reading.

Resources

License

Stars

Watchers

Forks

Packages

No packages published