$$\ $$\ $$$$$$$$\ $$\ $$ | $$ | \__$$ __| $$ | $$ |$$ / $$$$$$\$$$$\ $$$$$$\ $$$$$$\ $$ | $$$$$$\ $$$$$$\ $$ | $$$$$$$\ $$$$$ / $$ _$$ _$$\ $$ __$$\ $$ __$$\ $$ | $$ __$$\ $$ __$$\ $$ |$$ _____| $$ $$< $$ / $$ / $$ |$$$$$$$$ |$$ | \__| $$ | $$ / $$ |$$ / $$ |$$ |\$$$$$$\ $$ |\$$\ $$ | $$ | $$ |$$ ____|$$ | $$ | $$ | $$ |$$ | $$ |$$ | \____$$\ $$ | \$$\ $$ | $$ | $$ |\$$$$$$$\ $$ | $$ | \$$$$$$ |\$$$$$$ |$$ |$$$$$$$ | \__| \__|\__| \__| \__| \_______|\__| \__| \______/ \______/ \__|\_______/
kmertools
is a k-mer based feature extraction tool designed to support metagenomics and other bioinformatics analytics. This tool leverages k-mer analysis to vectorize DNA sequences, facilitating the use of these vectors in various AI/ML applications.
- Oligonucleotide Frequency Vectors: Generate frequency vectors for oligonucleotides.
- Minimiser Binning: Efficiently bin sequences using minimisers to reduce data complexity.
- Chaos Game Representation (CGR): Compute CGR vectors for DNA sequences based on k-mers or whole sequence transformation.
- Coverage Histograms: Create coverage histograms to analyze the depth of sequencing reads.
- Python Binding: You can import kmertools functionality using
import pykmertools as kt
You can install kmertools
from Bioconda at https://anaconda.org/bioconda/kmertools. Make sure you have conda installed.
# create conda environment and install kmertools
conda create -n kmertools -c bioconda kmertools
# activate environment
conda activate kmertools
You can install kmertools
from PyPI at https://pypi.org/project/pykmertools/.
pip install pykmertools
You can install kmertools
directly from the source by cloning the repository and using Rust's package manager cargo
.
git clone https://github.com/your-repository/kmertools.git
cd kmertools
cargo build --release
Now add the binary to path (you may modify ~/.bashrc
or ~/.zshrc
)
# to add to current terminal
export PATH=$PATH:$(pwd)/target/release/
# to save to ~/.bashrc
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.bashrc
source ~/.bashrc
# to save to ~/.zshrc for Mac
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.zshrc
source ~/.zshrc
To install the python bindings run the following commands. You can use either pip or conda directories for this.
# pip
cd pip
maturin build --release
# conda
cd conda
maturin build --release
Now move to parent directory using cd ..
and run the following command.
pip install target/wheels/pykmertools-<VERSION>-cp39-abi3-manylinux_2_34_x86_64.whl
After setting up, run the following command to print out the kmertools
help message.
kmertools --help
Please read our comprehensive Wiki.
- Anuradha Wickramarachchi https://anuradhawick.com
- Vijini Mallawaarachchi https://vijinimallawaarachchi.com
If you use kmertools
please cite as follows.
@software{Wickramarachchi_kmertools_DNA_Vectorisation,
author = {Wickramarachchi, Anuradha and Mallawaarachchi, Vijini},
title = {{kmertools: DNA Vectorisation Tool}},
url = {https://github.com/anuradhawick/kmertools},
version = {0.1.4}
}
Please refer to the Wiki for citations of relevant algorithms.
Please get in touch via author websites or GitHub issues. Thanks!