Skip to content

zd1/telseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TelSeq is a software that estimates telomere length from whole genome sequencing data (BAMs).

The most current development version is available from our git repository: git://github.com/zd1/telseq.git

The software is implemented in C++.

Citation:

Estimating telomere length from whole genome sequence data

Zhihao Ding; Massimo Mangino; Abraham Aviv; Tim Spector; Richard Durbin. Nucleic Acids Research 2014; doi: 10.1093/nar/gku181 http://nar.oxfordjournals.org/content/42/9/e75

Compile TelSeq

TelSeq dependency:

  • the bamtools library (https://github.com/pezmaster31/bamtools)

  • A modern version of GCC (version 4.8 or above) This can been seen by "gcc --version". If multiple GCCs are installed in your system, please set environmental variables pointing to the one of version 4.8 or above. e.g. in bash,

export CXX=/path/to/gcc/gcc-4.8.1/bin/g++
export CC=/path/to/gcc/gcc-4.8.1/bin/gcc

One easy way to install a new GCC is to use homebrew,

# install homebrew if you don't have it
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

# install GCC
brew install gcc

Compile

Go to the src directory and run autogen.sh from the src directory to generate the configure file ./autogen.sh Then run

./configure
make

The executable binary will be at src/Telseq/telseq. If bamtools are installed not at the system location, you can specify their location by

./configure --with-bamtools=/path/to/bamtools

The /path/to/bamtools directory is the directory that contains 'lib' and 'include' sub directories.

Run TelSeq

Read options and usage information

telseq

Analyse one or more BAMs by specifying BAM file path as command line arguments.

telseq a.bam b.bam

Analyse a list of BAMs whose paths are specified in a 'bamlist' file.

bamlist should contain only 1 column with each row the path of a BAM. i.e.

/path/to/a.bam
/path/to/b.bam

telseq -f bamlist

BAM file path can also be provided by piping in a 'bamlist', whose format must be same as above

cat bamlist | telseq

output

By default the result will be printed out to stdout. To change it to a file, use '-o' option to specify a file path. i.e.

telseq -o /path/to/output a.bam b.bam c.bam

This can also be achived by just direct the output to a file using '>', i.e.

telseq a.bam b.bam c.bam > /path/to/output

The software will print out running status to stderr as well. To separate them from stdout, one could direct log to a file, ie.

telseq a.bam b.bam c.bam 2>outputlog

Merge results from read groups by taking a weighted mean. However, it is benetifical run without -m to output the result per lane, so to have an idea about inter-lane variation. The merging can be done afterwards. telseq -m a.bam > output

Output file format

Column Definitions
ReadGroup read group, Defined by the RG tag in BAM header.
Library sequencing library that the read group belongs to.
Sample defined by the SM tag in BAM header.
Total total number of reads in this read group.
Mapped total number of mapped reads, SAM flag 0x4.
Duplicates total number of duplicate reads, SAM flag 0x400.
LENGH_ESTIMATE estimated telomere length.
TEL0 read counts for reads containing no TTAGGG/CCCTAA repeats.
TEL1 read counts for reads containing only 1 TTAGGG/CCCTAA repeats.
TELn read counts for reads containing only n TTAGGG/CCCTAA repeats.
TEL16 read counts for reads containing 16 TTAGGG/CCCTAA repeats.
GC0 read counts for reads with GC between 40%-42%.
GC1 read counts for reads with GC between 42%-44%.
GCn read counts for reads with GC between (40%+n*2%)-(42%+(n+1)*2%).
GC9 read counts for reads with GC between 58%-60%.

By default for each BAM a header line will be printed out. This can be suppressed by using the '-H' option. It is useful when one has multiple BAMs to scan and wish the output to be merged together. i.e.

telseq -H a.bam b.bam c.bam > myresult

To just print out the header, use '-h' option. i.e.

telseq -h

Docker

Install Docker

Please refer to the official website for installing Docker https://docs.docker.com/engine/installation/

Build telseq Docker image

docker build -t telseq-docker github.com/zd1/telseq

Run telseq

Note that the sample path "/path/to/bam/sample.bam" in the machine that the container is run needs to be specified. "/sample.bam" doesn't need to be changed.

docker run -v /path/to/bam/sample.bam:/sample.bam telseq-docker /sample.bam

Contact

zhihao.ding at gmail.com