PARamecium Toolbox for Interspersed DNA Elimination Studies
We present the Paramecium Toolbox for Interspersed DNA Elimination Studies (ParTIES), designed for Paramecium species, that (i) identifies eliminated sequences, (ii) measures their presence in a sequencing sample and (iii) detects rare elimination polymorphisms.
For a full description of the software, its options and results look at the user manual ("user_manual.pdf")
ParTIES requires some other programs. An installation example is provided at the end of the user manual, as well as in the "INSTALL" file. Once all the dependencies are installed, use the "check" file to make sure everything is ok.
./check
To run ParTIES use the following command
parties [MODE] : PARamecium Toolbox for Interspersed DNA Elimination Studies
Run : Run ParTIES using the configuration file
Map : Map reads on a reference using bowtie2
MIRAA : Method of Identification by Read Alignment Anomalies
MICA : Method of Identification by Comparison of Assemblies
Insert : Insert IES within a genome to create an IES containing reference
MIRET : Method of Ies RETention
Assembly : Filter reads and assemble them
MILORD : Method of Identification and Localization of Rare Deletion
Compare : Compare IES/InDel datasets
This software is distributed under the GNU GPL v3 license. See the "LICENSE" file for details.
If you use this software please cite the following publication
Denby Wilkes C, Arnaiz O, Sperling L. ParTIES : a toolbox for Paramecium interspersed DNA elimination studies. Bioinformatics. 2015 Nov 20. pii: btv691. [Epub ahead of print] PubMed PMID: 26589276.
The example directory contains the following files :
File | Description |
---|---|
scaffold51_1.fa | The somatic reference for this example [a single somatic scaffold of the Paramecium tetraurelia genome] |
Example_reads_1.fastq.gz | Read file 1 (paired with read file 2), 100 nt-long reads |
Example_reads_2.fastq.gz | Read file 2 (paired with read file 1), 100 nt-long reads |
example.cfg | Configuration file used to gives all needed options to run PARTIES with the All module (see below) |
The following command will run the entire pipeline (based on the config file), generating results in the $OUT directory. You can comment lines in the configuration file by adding "#" at the begining of a line.
OUT=Test1
gunzip example/Example_reads_1.fastq.gz
gunzip example/Example_reads_2.fastq.gz
parties Run -genome example/scaffold51_1.fa -out_dir $OUT -config example/example.cfg
Before running the pipeline, check the configuration file ("example/example.cfg") to set the number of threads.
You can also run each step independently, specifying the intermediate result files on the command line.
The map module will align the reads on the reference.
FASTQ1=example/Example_reads_1.fastq
FASTQ2=example/Example_reads_2.fastq
OUT=Test2
parties Map -genome example/scaffold51_1.fa -out_dir $OUT \
-fastq1 $FASTQ1 -fastq2 $FASTQ2 \
-max_insert_size 500 -index_genome -threads 4 -v
The MIRAA module searches for breakpoints in an alignment file.
parties MIRAA -genome example/scaffold51_1.fa -out_dir $OUT \
-bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
-min_break_coverage 5 -threads 4 -v
The Assembly module filters sequencing reads and assemble them into contigs. Three different assemblies are created.
parties Assembly -genome example/scaffold51_1.fa -out_dir $OUT \
-bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
-miraa $OUT/MIRAA/MIRAA.gff3 \
-fastq1 $FASTQ1 -fastq2 $FASTQ2 \
-insert_size 300 -kmer 51 -threads 4 -v
The MICA module computes comparisons between genomes, looking for insertions in germline genomes.
parties MICA -genome example/scaffold51_1.fa -out_dir $OUT \
-bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
-miraa $OUT/MIRAA/MIRAA.gff3 \
-germline_genome $OUT/Assembly/VELVET_51_at_least_one_no_match/VELVET_51_at_least_one_no_match_contigs.fa \
-germline_genome $OUT/Assembly/VELVET_51_no_filter/VELVET_51_no_filter_contigs.fa \
-germline_genome $OUT/Assembly/VELVET_51_no_mac_junctions/VELVET_51_no_mac_junctions_contigs.fa \
-insert_size 300 -threads 4 -v
The Insert module creates an IES containing reference
parties Insert -genome example/scaffold51_1.fa -out_dir $OUT \
-ies $OUT/MICA/MICA.gff3 -suffix _with_IES \
-threads 4 -v
We use the Map module once again to align the reads on the IES containing reference.
parties Map -genome $OUT/Insert/Insert.fa -out_dir $OUT \
-fastq1 $FASTQ1 -fastq2 $FASTQ2 \
-max_insert_size 500 -index_genome -threads 4 -v -force
The MIRET module calculates precisely the level of retention of each IES in a sample.
parties MIRET -genome example/scaffold51_1.fa -out_dir $OUT \
-germline_genome $OUT/Insert/Insert.fa \
-bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
-germline_bam $OUT/Map/$OUT.Insert.fa.BOWTIE.sorted.bam \
-ies $OUT/MICA/MICA.gff3 \
-germline_ies $OUT/Insert/Insert.gff3 \
-score_method Boundaries -threads 4 -v
The MILORD module searches for rare deletions in sequencing reads compared to a reference.
When run on a germline genome, we do expect to see deletions that correspond to somatic reads.
parties MILORD -genome $OUT/Insert/Insert.fa -out_dir $OUT \
-bam $OUT/Map/$OUT.Insert.fa.BOWTIE.sorted.bam \
-ies $OUT/Insert/Insert.gff3 \
-threads 4 -v
The Compare module allows coordinate-based comparisons between elements (MICA and/or MILORD results)
parties Compare -genome $OUT/Insert/Insert.fa -out_dir $OUT \
-reference_set $OUT/Insert/Insert.gff3 \
-current_set $OUT/MILORD/MILORD.gff3 \
-threads 4 -v
The Concatemere module searches for contamerized excision products : contamerized IESs or single-IES circles.
parties Concatemer -genome $OUT/Insert/Insert.fa -out_dir $OUT \
-seq_id null -fastq1 $FASTQ1 -fastq2 $FASTQ2 \
-ies $OUT/MICA/MICA.gff3 \
-bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
-germline_bam $OUT/Map/$OUT.Insert.fa.BOWTIE.sorted.bam \
-threads 4 -v
The MEND module analyze in more detail DNA reads overlapping TA dinculeotide excision sites.
parties MEND -out_dir $OUT \
-bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
-genome example/scaffold51_1.fa \
-germline_bam $OUT/Map/$OUT.Insert.fa.BOWTIE.sorted.bam \
-germline_genome $OUT/Insert/Insert.fa \
-germline_ies $OUT/Insert/Insert.gff3 \
-ies $OUT/MICA/MICA.gff3 \
-excision_errors $OUT/Compare/Compare.current.gff3 \
-threads 4 -v