-
Notifications
You must be signed in to change notification settings - Fork 10
Home
SPLASH is an unsupervised and reference-free unifying framework to discover sample-dependent sequence variation through statistical analysis of k-mer composition in both DNA and RNA sequence. Under this framework, both Sample (could be a cell barcode, a bulk RNA-seq sample, or DNA-seq sample) and Sequence (could be RNA or DNA) are general.
SPLASH leverages the unifying framework that detecting sample-regulated sequence variation, such as alternative splicing, RNA editing, gene fusions, V(D)J, transposable element mobilization, allele-specific splicing, genetic variation in a population, and many other regulated processes on DNA and RNA can be characterized by signature k-mers, without requiring a reference (Chaung et al. 2023, Cell).
See "how to use SPLASH" in the right bar.
SPLASH finds constant sequences anchors that are followed by a set of sequences targets with sample-specific target variation and provides valid p-values. The targets can be adjacent to anchors or can be separated by a gap. SPLASH is reference-free, sidestepping the computational challenges associated with alignment and making it significantly faster and more efficient than alignment, and enabling discovery and statistical precision not currently available, even from pseudo-alignment.
The first version of SPLASH pipeline proved its usefulness. It was implemented mainly in Python with the use of NextFlow. Here we provide a new and improved implementation based in C++ and Python (Kokot et al. 2024). This new version is much more efficient and allows for the analysis of datasets >1TB size in hours on a workstation or even a laptop. We have also extended this framework, named sc-SPLASH, (Dehghannasiri et al., 2024) to barcoded data analysis (e.g., 10x scRNA-Seq and Visium Spatial transcriptomic).
The image below presents the SPLASH pipeline on a high level.
Marek Kokot*, Roozbeh Dehghannasiri*, Tavor Baharav, Julia Salzman, and Sebastian Deorowicz. Scalable and unsupervised discovery from raw sequencing reads using SPLASH2, Nature Biotechnology (2024)
Roozbeh Dehghannasiri*, Marek Kokot*, Alexander Starr, Jamie Maziarz, Tal Gordon, Serena Tan, Peter Wang, Ayelet Voskoboynik, Jacob Musser, Sebastian Deorowicz, and Julia Salzman. sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing, bioRxiv (2024)
Kaitlin Chaung*, Tavor Baharav*, George Henderson, Ivan Zheludev, Peter Wang, and Julia Salzman. SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery, Cell (2023)
Tavor Baharav, David Tse, and Julia Salzman. OASIS: An interpretable, finite-sample valid alternative to Pearson’s X2 for scientific discovery, PNAS (2024)
George Henderson, Adam Gudys, Tavor Baharav, Punit Sundaramurthy, Marek Kokot, Peter L. Wang, Sebastian Deorowicz, Allison F. Carey, and Julia Salzman. Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly bioRxiv 2024.01.18.576133 (2024)