HMMRATAC

This is a performance refactor of the original HMMRATAC by Evan D. Tarbell and Tao Liu at https://github.com/LiuLabUB/HMMRATAC This refactor does NOT affect output accuracy and output should be the same as with the original program. Model training speed has been increased by a factor 70 and memory usage issues have been fixed. New features have been added and the original features have been fixed where needed. This includes support for .csi files, processing large files, more output options and non-binary model files.

Accurate Data Wanted To increase program accuracy, files with fully known peaks are highly appreciated to use for further testing and accuracy increases.

Quick Start

Assume that you have a BAM file from aligner such as bwa mem named ATACseq.bam.

Sort the BAM file to get a ATACseq.sorted.bam file:

samtools sort ATACseq.bam -o ATACseq.sorted.bam
Make index from the BAM file to get a ATACseq.sorted.bam.bai file:

samtools index ATACseq.sorted.bam ATACseq.sorted.bam.bai
Make genome information (chromosome sizes) from the BAM file to get a genome.info file:

samtools view -H ATACseq.sorted.bam| perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > genome.info
Run HMMRATAC on the sorted BAM ATACseq.sorted.bam, the BAM index file ATACseq.sorted.bam.bai, and the genome information file genome.info:

java -jar HMMRATAC_V2.0.0.jar -b ATACseq.sorted.bam -i ATACseq.sorted.bam.bai -g genome.info
Filter HMMRATAC output by the score, if desired. Score threshold will depend on dataset, score type and user preference. A threshold of 10 would be:

awk -v OFS="\t" '$13>=10 {print}' NAME_peaks.gappedPeak > NAME.filteredPeaks.gappedPeak

To filter the summit file by the same threshold:

awk -v OFS="\t" '$5>=10 {print}' NAME_summits.bed > NAME.filteredSummits.bed

NOTE: HMMRATAC will report all peaks that match the structure defined by the model, including weak peaks. Filtering by score can be used to retain stronger peaks. Lower score = higher sensitivity and lower precision, Higher score = lower sensitivity and higher precision.

Samtools can be downloaded here: http://www.htslib.org/download/

Be sure to run HMMRATAC using the executable file, found here: https://github.com/Mikxox/HMMRATAC/releases For details on HOW to run HMMRATAC, see HMMRATAC_Guide.md, which contains a thorough runthrough of all parameters, output files and input requirements and troubleshooting.

HMMRATAC requires paired-end data. Single-end data will not work. HMMRATAC is designed to process ATAC-seq data that hasn't undergone any size selection, either physical or in silico. This should be standard practice for any ATAC-seq analysis. Size selected data can be processed by HMMRATAC (see HMMRATAC_Guide.md on --trim option).

If you use HMMRATAC in your research, please cite the following paper:

Evan D. Tarbell and Tao Liu, HMMRATAC: a Hidden Markov ModeleR for ATAC-seq, Nucleic Acids Res. 2019 Jun 14. doi: 10.1093/nar/gkz533
Adding an extra reference to this performance refactor is appreciated but not required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HMMRATAC

Files

README.md

Latest commit

History

README.md

File metadata and controls

HMMRATAC