Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculates the ref length of mate read using "MC:Z" when available #67

Merged
merged 8 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cmake_htslib.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
htslib_matrix:
strategy:
matrix:
version: [1.11, 1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18]
version: [1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18]
os: [ubuntu-22.04, ubuntu-20.04, ubuntu-latest]

# The CMake configure and build commands are platform agnostic and should work equally well on Windows or Mac.
Expand Down
25 changes: 13 additions & 12 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,21 @@ levioSAM2 supports a variety of methods for installation:

## Conda

```
```shell
conda install -c conda-forge -c bioconda leviosam2
```

## Docker

You can obtain a Docker image of the latest version from Docker hub:

```
```shell
docker pull naechyun/leviosam2:latest
```

## Singularity

```
```shell
singularity pull docker://naechyun/leviosam2:latest
```

Expand All @@ -33,34 +33,37 @@ singularity pull docker://naechyun/leviosam2:latest

Make sure the following prerequisite libraries are installed on your system.

- [htslib v1.11+ (tested up to v1.16)](https://github.com/samtools/htslib)
- [htslib v1.12+ (tested up to v1.18)](https://github.com/samtools/htslib)
- [sdsl-lite v2.1.1+](https://github.com/simongog/sdsl-lite/)

Both libraries are available through coda:
```

```shell
conda install -c conda-forge sdsl-lite
conda install -c bioconda htslib
```

Another easy way to install these dependencies is to use your OS's existing package system:
```

```shell
apt-get install libhts-dev libsdsl-dev # Debian/Ubuntu
brew tap brewsci/bio; brew install htslib sdsl-lite # MacOS
```

If using RedHat or Fedora, then you must install sdsl-lite manually. But you can install htslib through yum:
```

```shell
yum install htslib
```

Or you can choose to install them manually by following the install instructions on their respective pages.

### CMake
### CMake

Once the prerequisite packages are installed and the locations of their installations are known, specify their locations
to CMake by running the following commands:

```
```shell
mkdir build
cd build
cmake ..
Expand All @@ -73,12 +76,11 @@ make
If you installed the dependencies manually, you might have to modify the `cmake` command to specify their library and
include directory locations like so:

```
```shell
cmake -D CMAKE_LIBRARY_PATH="/path/to/libsdsl/;/path/to/libhts/" \
-D CMAKE_INCLUDE_PATH="/path/to/libsdsl/include/;/path/to/libhts/include/" ..
```


## Test

We provide an end-to-end test and a set of unit tests for levioSAM.
Expand All @@ -87,7 +89,6 @@ We provide an end-to-end test and a set of unit tests for levioSAM.

- The unit test can be run `cd build; ctest` if you use cmake to build levioSAM; or `make gtest; cd testdata; ../gtest` if you use make to build levioSAM2.


## ARM64

LevioSAM2 supports the arm64 architecture. Note that the distributed libraries of the dependencies (namely `htslib` and `sdsl-lite`) on Conda or other package managers might not support arm64. Thus, you might need to build the dependent libraries from source. Both `htslib` and `sdsl-lite` can be built under the arm64 architecture.
Expand Down
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# LevioSAM2: Fast and accurate coordinate conversion between assemblies

[![Docker](https://img.shields.io/docker/v/naechyun/leviosam2?label=Docker)](https://hub.docker.com/r/naechyun/leviosam2)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/leviosam2/badges/version.svg)](https://anaconda.org/bioconda/leviosam2)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/leviosam2/badges/downloads.svg)](https://anaconda.org/bioconda/leviosam2)
![Cmake build](https://github.com/milkschen/leviosam2/actions/workflows/cmake_htslib.yml/badge.svg)
![Integration](https://github.com/milkschen/leviosam2/actions/workflows/integration-test.yml/badge.svg)

# LevioSAM2: Fast and accurate coordinate conversion between assemblies

LevioSAM2 lifts over alignments accurately and efficiently using a chain file.

<picture>
Expand All @@ -17,9 +17,9 @@ LevioSAM2 lifts over alignments accurately and efficiently using a chain file.

- Converting aligned short and long reads records (in SAM/BAM/CRAM format) from one reference to another
- Comprehensive alignment feature updating during lift-over:
- Reference name (`RNAME`), position (`POS`), alignmant flag (`FLAG`), and CIGAR alignment string (`CIGAR`)
- Mate read information (`RNEXT`, `PNEXT`, `TLEN`)
- (optional) Alignment tags (`MD:Z`, `NM:i`)
- Reference name (`RNAME`), position (`POS`), alignmant flag (`FLAG`), and CIGAR alignment string (`CIGAR`)
- Mate read information (`RNEXT`, `PNEXT`, `TLEN`)
- (optional) Alignment tags (`MD:Z`, `NM:i`)
- Multithreading support
- Toolkit for "selective" pipelines which consider major changes between the source and target references
- (beta) Converting intervals (in BED format) from one reference to another
Expand All @@ -30,20 +30,22 @@ LevioSAM2 can be installed using:

- [Conda](https://anaconda.org/bioconda/leviosam2)

```
```shell
# The following commands install leviosam2 in a new conda environment called `leviosam2`
conda create -n leviosam2
conda activate leviosam2
conda install -c bioconda -c conda-forge leviosam2
```

- [Docker](https://hub.docker.com/r/naechyun/leviosam2)
```

```shell
docker pull naechyun/leviosam2:latest
```

- [Singularity](https://hub.docker.com/r/naechyun/leviosam2)
```

```shell
singularity pull docker://naechyun/leviosam2:latest
```

Expand All @@ -62,7 +64,7 @@ For other reference pairs, common ways to generate chain files include using the

LevioSAM2 indexes a chain file for lift-over queries. The resulting index has a `.clft` extension.

```
```shell
leviosam2 index -c source_to_target.chain -p source_to_target -F target.fai
```

Expand All @@ -73,17 +75,17 @@ The levioSAM2 ChainMap index will be saved to `source_to_target.clft`. The outpu

__We highly recommend to sort the input BAM by position prior to running levioSAM2-lift.__

```
```shell
leviosam2 lift -C source_to_target.clft -a aligned_to_source.bam -p lifted_from_source -O bam
```


### Full levioSAM2 workflow with selective re-mapping

The levioSAM2 workflow includes lift-over using the `leviosam2-lift` kernel and a selective re-mapping strategy. This approach can improve accuracy.

Example:
```

```shell
# You may skip the indexing step if you've already run it
leviosam2 index -c source_to_target.chain -p source_to_target -F target.fai
sh leviosam2.sh \
Expand All @@ -98,10 +100,9 @@ sh leviosam2.sh \

See [this README](https://github.com/milkschen/leviosam2/blob/main/workflow/README.md) to learn more about running the full levioSAM2 workflow.


## Publication

- Nae-Chyun Chen, Luis Paulin, Fritz Sedlazeck, Sergey Koren, Adam Phillippy, Ben Langmead. Improved sequence mapping using a complete reference genome and lift-over. Nat Methods (2023). https://doi.org/10.1038/s41592-023-02069-6
- Nae-Chyun Chen, Luis Paulin, Fritz Sedlazeck, Sergey Koren, Adam Phillippy, Ben Langmead. Improved sequence mapping using a complete reference genome and lift-over. Nat Methods (2023). https://doi.org/10.1038/s41592-023-02069-6
- Taher Mun, Nae-Chyun Chen, Ben Langmead. LevioSAM: Fast lift-over of variant-aware reference alignments, _Bioinformatics_, 2021;, btab396, https://doi.org/10.1093/bioinformatics/btab396

_Logo credit: Ting-Wei Young_
34 changes: 22 additions & 12 deletions src/chain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -822,8 +822,16 @@ int32_t ChainMap::get_num_clipped(const int32_t pos, const bool leftmost,
return -1;
}

/* Lift an alignment segment (there are two segments in a paired-end alignment)
* Return false if unliftable.
/**
* Lifts an alignment segment (a paired-end read has two segments).
*
* @param aln Alignment object to be lifted.
* @param hdr_source Source BAM header.
* @param hdr_dest Target BAM header.
* @param first_seg True if the first segment in a paired-end alignment. False
* if the mate.
* @param dest_contig Target reference config.
* @return True if liftable.
*/
bool ChainMap::lift_segment(bam1_t *aln, sam_hdr_t *hdr_source,
sam_hdr_t *hdr_dest, bool first_seg,
Expand All @@ -850,9 +858,7 @@ bool ChainMap::lift_segment(bam1_t *aln, sam_hdr_t *hdr_source,
// the segment is unliftable
if (!update_interval_indexes(source_contig, pos, start_sidx, start_eidx))
return false;
// Set a segment as unliftable if the gap between it
// and the nearby interval is too large
//

// sidx might be advanced here
int32_t num_sclip_start =
get_num_clipped(pos, true, source_contig, start_sidx, start_eidx);
Expand Down Expand Up @@ -888,13 +894,17 @@ bool ChainMap::lift_segment(bam1_t *aln, sam_hdr_t *hdr_source,
// Only check mate if paired-end
if (c->tid < 0 || (c->flag & 1 && c->mtid < 0)) return false;

// Estimate ending pos of the mate
// If it's the first segment, we can calculate the query length wrt the REF;
// If it's the second segment, we can only do a rough estimate b/c we don't
// know the RLEN of the mate.
auto pos_end = (first_seg)
? c->pos + bam_cigar2rlen(c->n_cigar, bam_get_cigar(aln))
: c->mpos + c->l_qseq;
// Gets the ending pos of the mate.
//
// For the first segment, uses the cigar string to calculate the read
// reference length.
//
// For the mate segment, infers the ending position. The position would be
// accurate if the "MC:Z" tag is available.
auto pos_end =
(first_seg) ? c->pos + bam_cigar2rlen(c->n_cigar, bam_get_cigar(aln))
: c->mpos + LevioSamUtils::get_mate_query_len_on_ref(aln);

int end_sidx = 0;
int end_eidx = 0;
// Update end indexes; if either is unavailable, mark the segment as
Expand Down
5 changes: 0 additions & 5 deletions src/chain_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,6 @@
#include "gtest/gtest.h"
#include "leviosam_utils.hpp"

size_t kstr_get_m(size_t var) {
size_t lvar = (size_t)exp2(ceil(log2(var)));
return lvar;
}

/* Chain tests */
TEST(ChainTest, SimpleRankAndLift) {
std::vector<std::pair<std::string, int32_t>> lm;
Expand Down
2 changes: 1 addition & 1 deletion src/leviosam.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ void read_and_lift(T *lift_map, std::mutex *mutex_fread,
ref.substr(aln_vec[i]->core.pos,
bam_cigar2rlen(aln_vec[i]->core.n_cigar,
bam_get_cigar(aln_vec[i])));
std::string q_seq = LevioSamUtils::get_read_as_is(aln_vec[i]);
std::string q_seq = LevioSamUtils::get_read(aln_vec[i]);
std::vector<uint32_t> new_cigar;
// We perform global alignment for local sequences and
// thus an alignment with a high number of clipped bases
Expand Down
Loading
Loading