Skip to content

Preprocess bam file into a compressed alignment incidence matrix (equivalence class)

License

Notifications You must be signed in to change notification settings

churchill-lab/alntools

Repository files navigation

alntools

Documentation Status Updates

alntools processes next-generation sequencing read alignments into a sparse compressed incidence matrix (aka Equivalence Classes) and stores it in a pre-defined binary format for efficient downstream analyses and storage. It enables us to compare, contrast, or combine the results of different alignment strategies.

Features

  • split divides a large bam file into smaller ones
  • bam2ec preprocesses a bam file into binary equivalence class (EC) format
  • bam2emase preprocesses a bam file into EMASE format
  • ec2emase converts binary EC file into EMASE format
  • emase2ec converts EMASE format into binary EC format
  • range finds effective lengths of target sequences from alignment data

We are planning to add more features for preprocessing different formats of NGS read alignments at the population level and summarizing useful information from them. Stay tuned!

About

Preprocess bam file into a compressed alignment incidence matrix (equivalence class)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published