isONclust3

A rust implementation of a novel de novo clustering algorithm. isONclust3 is a tool for clustering either PacBio Iso-Seq reads, or Oxford Nanopore reads into clusters, where each cluster represents all reads that came from a gene family. Output is a tsv file with each read assigned to a cluster-ID and a folder 'fastq' containing one fastq file per cluster generated. Detailed information is available in the isONclust3 paper.

In the repository run /usr/bin/time -v target/release/isONclust3 --fastq Example_data/test_data.fastq --mode ont --outfolder Example_out --seeding minimizer --post-cluster. This generates an output directory in the repository folder. The fastq_files folder inside clustering should now contain 94 fastq files(each representing one cluster).

Running isONclust3

IsONclust3 can be used on either Pacbio data or ONT data.

isONclust3 --fastq {input.fastq} --mode ont  --outfolder {outfolder}         # Oxford Nanopore reads
isONclust3 --fastq {input.fastq} --mode pacbio  --outfolder {outfolder}      # PacBio reads

The --mode ont argument means setting --k 13 --w 21. The --mode pacbio argument is equal to setting --k 15 --w 51.

Output

Clustering information

The output consists of a tsv file final_clusters.tsv present in the specified output folder. In this file, the first column is the cluster ID and the second column is the read accession. For example:

0 read_X_acc
0 read_Y_acc
...
n read_Z_acc

if there are n reads there will be n rows. Some reads might be singletons.

Clusters

IsONclust outputs the reads in .fastq file format with each file containing the reads for the respective cluster. The .fastq files are located in the fastq_files directory that is created in the given outfolder.

Contact

If you encounter any problems, please raise an issue on the issues page, you can also contact the developer of this repository via: alexander.petri[at]math.su.se

Credits

Please cite this study when using isONclust3:

Alexander J. Petri, Kristoffer Sahlin. De novo clustering of extensive long-read transcriptome datasets with isONclust3. bioRxiv 2024.10.29.620862; doi: https://doi.org/10.1101/2024.10.29.620862

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
Example_data		Example_data
src		src
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
isONclust_rs.iml		isONclust_rs.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

isONclust3

Table of contents

Installation Guide

Installing Rust

Installation via Cargo package manager

Installation

Testing the installation

Running isONclust3

Output

Clustering information

Clusters

Contact

Credits

About

Releases 3

Packages

Contributors 2

Languages

License

aljpetri/isONclust3

Folders and files

Latest commit

History

Repository files navigation

isONclust3

Table of contents

Installation Guide

Installing Rust

Installation via Cargo package manager

Installation

Testing the installation

Running isONclust3

Output

Clustering information

Clusters

Contact

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages