Skip to content

Fast and accurate coordinate conversion between assemblies

License

Notifications You must be signed in to change notification settings

milkschen/leviosam2

Repository files navigation

LevioSAM2: Fast and accurate coordinate conversion between assemblies

Docker Anaconda-Server Badge Anaconda-Server Badge Cmake build Integration

LevioSAM2 lifts over alignments accurately and efficiently using a chain file.

Text changing depending on mode. Light: 'So light!' Dark: 'So dark!'

Features

  • Converting aligned short and long reads records (in SAM/BAM/CRAM format) from one reference to another
  • Comprehensive alignment feature updating during lift-over:
    • Reference name (RNAME), position (POS), alignmant flag (FLAG), and CIGAR alignment string (CIGAR)
    • Mate read information (RNEXT, PNEXT, TLEN)
    • (optional) Alignment tags (MD:Z, NM:i)
  • Multithreading support
  • Toolkit for "selective" pipelines which consider major changes between the source and target references
  • (beta) Converting intervals (in BED format) from one reference to another

Installation

LevioSAM2 can be installed using:

# The following commands install leviosam2 in a new conda environment called `leviosam2`
conda create -n leviosam2
conda activate leviosam2
conda install -c bioconda -c conda-forge leviosam2
docker pull naechyun/leviosam2:latest
singularity pull docker://naechyun/leviosam2:latest
  • Built from source using CMake. See INSTALL.md for details.

Usage

Prepare chain files

LevioSAM2 performs lift-over using a chain file as the lift-over map. Many chain files are provided by the UCSC Genome Browser, e.g. GRCh38-related chains. For other reference pairs, common ways to generate chain files include using the UCSC recipe and nf-LO.

LevioSAM2-index

LevioSAM2 indexes a chain file for lift-over queries. The resulting index has a .clft extension.

leviosam2 index -c source_to_target.chain -p source_to_target -F target.fai

LevioSAM2-lift

LevioSAM2-lift is the lift-over kernel of the levioSAM2 toolkit. The levioSAM2 ChainMap index will be saved to source_to_target.clft. The output will be saved to lifted_from_source.bam.

We highly recommend to sort the input BAM by position prior to running levioSAM2-lift.

leviosam2 lift -C source_to_target.clft -a aligned_to_source.bam -p lifted_from_source -O bam

Full levioSAM2 workflow with selective re-mapping

The levioSAM2 workflow includes lift-over using the leviosam2-lift kernel and a selective re-mapping strategy. This approach can improve accuracy.

Example:

# You may skip the indexing step if you've already run it
leviosam2 index -c source_to_target.chain -p source_to_target -F target.fai
sh leviosam2.sh \
    -a bowtie2 -A -10 -q 10 -H 5 \
    -i aligned_to_source.bam \
    -o aligned_to_source-lifted \
    -f target.fna \
    -b bt2/target \
    -C source_to_target.clft \
    -t 16

See this README to learn more about running the full levioSAM2 workflow.

Publication

Logo credit: Ting-Wei Young