Skip to content

Commit 35d3306

Browse files
committed
Improved transloc function
1 parent 6b9b0ba commit 35d3306

File tree

7 files changed

+562
-130
lines changed

7 files changed

+562
-130
lines changed

README.md

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
# Dantools
1+
# DanTools
22

33
## Contact
44

55
Daniel Klimes (<daniel.s.klimes@gmail.com>)
66

77
## Overview
88

9-
Dantools is a tool used for comparing genome sequences between
9+
DanTools is a tool used for comparing genome sequences between
1010
divergent organisms, such as different species. It does this by
1111
creating a pseudogenome, a modification of a reference genome to match
1212
the sequence of another. In doing so, a list of genetic variants
@@ -16,13 +16,13 @@ by keeping track of position shifts from indels, the annotations of
1616
the original reference can be extended to the new species'
1717
genome. This allows for the alignment of RNA-Seq data from divergent
1818
species against a set of genomes with unified annotations. Critically,
19-
Dantools does not require either genome to be well assembled, and can
19+
DanTools does not require either genome to be well assembled, and can
2020
even accept an RNA-Seq .fastq input.
2121

2222
## Methodology
2323

24-
Dantools in its base form accepts two genomes as input: the base and
25-
the source. In its first step, Dantools breaks the source genome into
24+
DanTools in its base form accepts two genomes as input: the base and
25+
the source. In its first step, DanTools breaks the source genome into
2626
sequence "fragments", 50-10,000 base pieces of the genome and its
2727
reverse complement. These fragments are then aligned against the base
2828
genome using HISAT2 with lowered alignment stringency. Freebayes is
@@ -37,12 +37,13 @@ relative position within features and flanking regions. In the final
3737
step, the input GFF (if provided) is then shifted to accomodate indel
3838
mutations that were created in the process of modification.
3939

40-
Because of the flexibility of inputs, Dantools can accept RNA-Seq
40+
Because of the flexibility of inputs, DanTools can accept RNA-Seq
4141
reads in fastq format and minimally-assembled genomes.
4242

43-
In future iterations of Dantools, functionality will be added for:
44-
-Determining genome rearrangements
45-
-Translating nucleic acid to amino acid changes
43+
DanTools additionally provides various helper functions used to
44+
label variants relative to features, translate VCF files into
45+
amino acids, predict translocations, and summarize variant
46+
information (see Usage)
4647

4748
## Installation
4849

@@ -54,12 +55,12 @@ dantools in a self-contained tree with the github-downloaded source in
5455
src/
5556

5657
```{bash, eval=FALSE}
57-
mkdir -p dantools/202407
58-
cd dantools/202406
58+
mkdir -p dantools/202408
59+
cd dantools/202408
5960
git clone https://github.com/elsayed-lab/dantools.git
6061
mv dantools src
6162
## tell MakeMaker and Module::Build to use this tree
62-
## All prerequisite perl libraries will go to dantools/202406/lib/perl5
63+
## All prerequisite perl libraries will go to dantools/202408/lib/perl5
6364
export PERL5LIB="$(pwd)/lib/perl5:${PERL5LIB}"
6465
export PERL_MM_OPT="INSTALL_BASE=$(pwd)"
6566
export PERL_MB_OPT="--install_base $(pwd)"
@@ -74,7 +75,7 @@ cpanm Moo::Role Parallel::ForkManager
7475
make && make install
7576
```
7677

77-
Dantools relies on a set of software packages not listed in the
78+
DanTools relies on a set of software packages not listed in the
7879
makefile to perform some of its tasks:
7980

8081
- [HISAT2](https://github.com/DaehwanKimLab/hisat2)
@@ -119,22 +120,45 @@ process, the base genome features need to be shifted in their relative
119120
positions:
120121

121122
```{bash, eval=FALSE}
122-
dantools shift -v variants.vcf -f base.gff
123+
dantools shift -o shifted.gff -v variants.vcf -f base.gff
123124
```
124125

125-
Now, RNA-Seq data aligned against the pseudogenome can be counted with
126-
the shifted GFF file.
126+
As an estimate for how well fragments/reads aligned to certain
127+
features, the alignment depth can be summarized over features:
127128

128-
If one wants to label the variants according to a GFF file:
129+
```{bash, eval=FALSE}
130+
dantools summarize-depth -f shifted.gff -d depth.tsv --feature gene
131+
```
132+
133+
Now, any RNA-Seq data aligned against the pseudogenome can be counted with
134+
the shifted GFF file. It is recommended features with low fragment
135+
alignment depth be removed as comparisons there were likely less accurate
136+
137+
One can also perform a variety of variant analyses with DanTools. If one
138+
wants to label the variants according to a GFF file:
129139

130140
```{bash, eval=FALSE}
131141
dantools label -v variants.vcf -f base.gff --features five_prime_UTR,CDS,three_prime_UTR
132142
```
133143

134144
These variants can optionally be translated into amino acid changes
135-
and scored by a mutation scoring matrix.
145+
and scored by a mutation scoring matrix using the --translate option.
146+
Both nucleotide and translated outputs can then be summarized by feature
147+
using a set of helper functions:
148+
149+
```{bash, eval=FALSE}
150+
dantools summarize-nuc labeled_nucleotides.tsv
151+
dantools summarize-aa labeled_aa.tsv
152+
```
153+
154+
For alignment of dantools fragment/pseudogen produced genome fragments,
155+
translocation events can also be investigated:
156+
157+
```{bash, eval=FALSE}
158+
dantools transloc alignment.sam
159+
```
136160

137-
Additional functionalities exist and are listed with the simple command:
161+
A list of DanTools functions can be produced with the simple command:
138162

139163
```{bash, eval=FALSE}
140164
dantools

bin/Dantools.pl

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -373,8 +373,10 @@ BEGIN
373373

374374
my $input_sam;
375375
my $output;
376-
my $gap_thresh = 10000;
376+
my $ref_gap = 1000;
377+
my $query_gap = 1000;
377378
my $min_qbase = 10000;
379+
my $max_diff = 50; #% difference in length between ref and query
378380
my $min_depth = 1;
379381
my $min_length = 1;
380382
my $input_vcf = 0;
@@ -383,10 +385,12 @@ BEGIN
383385
GetOptions(
384386
"sam|s=s" => \$input_sam,
385387
"output|o=s" => \$output,
386-
"gap-thresh|g=i" => \$gap_thresh,
388+
"ref-gap|r=i" => \$ref_gap,
389+
"query-gap|q=i" => \$query_gap,
387390
"min-qbase|b=i" => \$min_qbase,
388391
"min-depth|d=i" => \$min_depth,
389392
"min-length|l=i" => \$min_length,
393+
"max-diff=i" => \$max_diff,
390394
"vcf|v=s" => \$input_vcf,
391395
"all|a" => \$all,
392396
"help|h" => \$help
@@ -421,7 +425,9 @@ BEGIN
421425
Bio::Dantools::transloc(
422426
input_sam => "$input_sam",
423427
output => "$output",
424-
gap_thresh => "$gap_thresh",
428+
ref_gap => "$ref_gap",
429+
query_gap => "$query_gap",
430+
max_diff => "$max_diff",
425431
min_qbase => "$min_qbase",
426432
min_depth => "$min_depth",
427433
min_length => "$min_length",

helpdocs/dantools.help

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Dantools Version 1.0.0 by Daniel Klimes (daniel.s.klimes@gmail.com)
1+
Dantools Version 1.1.0 by Daniel Klimes (daniel.s.klimes@gmail.com)
22
Tools for the Comparison of Disparate Genomes
33

44
Usage: dantools <command> [options]

helpdocs/transloc.help

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,12 @@ Reads a SAM alignment of fragments (from dantools fragment) to predict
33
translocations
44

55
Options:
6-
-g, --gap-thresh Maximum base distance between alignments to
7-
consider them the same translocation
8-
(default: 10000)
6+
-r, --ref-gap Maximum base distance between alignments
7+
relative to reference to group them together
8+
(default: 1000)
9+
-q, --query-gap Maximum base distance between alignments
10+
relative to query to group them together
11+
(default: 1000)
912
-b, --min-qbase Minimum number of read bases (length * number)
1013
necessary to consdier translocation
1114
(default: 10000)

0 commit comments

Comments
 (0)