1
- # Dantools
1
+ # DanTools
2
2
3
3
## Contact
4
4
5
5
Daniel Klimes (<daniel.s.klimes@gmail.com >)
6
6
7
7
## Overview
8
8
9
- Dantools is a tool used for comparing genome sequences between
9
+ DanTools is a tool used for comparing genome sequences between
10
10
divergent organisms, such as different species. It does this by
11
11
creating a pseudogenome, a modification of a reference genome to match
12
12
the sequence of another. In doing so, a list of genetic variants
@@ -16,13 +16,13 @@ by keeping track of position shifts from indels, the annotations of
16
16
the original reference can be extended to the new species'
17
17
genome. This allows for the alignment of RNA-Seq data from divergent
18
18
species against a set of genomes with unified annotations. Critically,
19
- Dantools does not require either genome to be well assembled, and can
19
+ DanTools does not require either genome to be well assembled, and can
20
20
even accept an RNA-Seq .fastq input.
21
21
22
22
## Methodology
23
23
24
- Dantools in its base form accepts two genomes as input: the base and
25
- the source. In its first step, Dantools breaks the source genome into
24
+ DanTools in its base form accepts two genomes as input: the base and
25
+ the source. In its first step, DanTools breaks the source genome into
26
26
sequence "fragments", 50-10,000 base pieces of the genome and its
27
27
reverse complement. These fragments are then aligned against the base
28
28
genome using HISAT2 with lowered alignment stringency. Freebayes is
@@ -37,12 +37,13 @@ relative position within features and flanking regions. In the final
37
37
step, the input GFF (if provided) is then shifted to accomodate indel
38
38
mutations that were created in the process of modification.
39
39
40
- Because of the flexibility of inputs, Dantools can accept RNA-Seq
40
+ Because of the flexibility of inputs, DanTools can accept RNA-Seq
41
41
reads in fastq format and minimally-assembled genomes.
42
42
43
- In future iterations of Dantools, functionality will be added for:
44
- -Determining genome rearrangements
45
- -Translating nucleic acid to amino acid changes
43
+ DanTools additionally provides various helper functions used to
44
+ label variants relative to features, translate VCF files into
45
+ amino acids, predict translocations, and summarize variant
46
+ information (see Usage)
46
47
47
48
## Installation
48
49
@@ -54,12 +55,12 @@ dantools in a self-contained tree with the github-downloaded source in
54
55
src/
55
56
56
57
``` {bash, eval=FALSE}
57
- mkdir -p dantools/202407
58
- cd dantools/202406
58
+ mkdir -p dantools/202408
59
+ cd dantools/202408
59
60
git clone https://github.com/elsayed-lab/dantools.git
60
61
mv dantools src
61
62
## tell MakeMaker and Module::Build to use this tree
62
- ## All prerequisite perl libraries will go to dantools/202406 /lib/perl5
63
+ ## All prerequisite perl libraries will go to dantools/202408 /lib/perl5
63
64
export PERL5LIB="$(pwd)/lib/perl5:${PERL5LIB}"
64
65
export PERL_MM_OPT="INSTALL_BASE=$(pwd)"
65
66
export PERL_MB_OPT="--install_base $(pwd)"
@@ -74,7 +75,7 @@ cpanm Moo::Role Parallel::ForkManager
74
75
make && make install
75
76
```
76
77
77
- Dantools relies on a set of software packages not listed in the
78
+ DanTools relies on a set of software packages not listed in the
78
79
makefile to perform some of its tasks:
79
80
80
81
- [ HISAT2] ( https://github.com/DaehwanKimLab/hisat2 )
@@ -119,22 +120,45 @@ process, the base genome features need to be shifted in their relative
119
120
positions:
120
121
121
122
``` {bash, eval=FALSE}
122
- dantools shift -v variants.vcf -f base.gff
123
+ dantools shift -o shifted.gff - v variants.vcf -f base.gff
123
124
```
124
125
125
- Now, RNA-Seq data aligned against the pseudogenome can be counted with
126
- the shifted GFF file.
126
+ As an estimate for how well fragments/reads aligned to certain
127
+ features, the alignment depth can be summarized over features:
127
128
128
- If one wants to label the variants according to a GFF file:
129
+ ``` {bash, eval=FALSE}
130
+ dantools summarize-depth -f shifted.gff -d depth.tsv --feature gene
131
+ ```
132
+
133
+ Now, any RNA-Seq data aligned against the pseudogenome can be counted with
134
+ the shifted GFF file. It is recommended features with low fragment
135
+ alignment depth be removed as comparisons there were likely less accurate
136
+
137
+ One can also perform a variety of variant analyses with DanTools. If one
138
+ wants to label the variants according to a GFF file:
129
139
130
140
``` {bash, eval=FALSE}
131
141
dantools label -v variants.vcf -f base.gff --features five_prime_UTR,CDS,three_prime_UTR
132
142
```
133
143
134
144
These variants can optionally be translated into amino acid changes
135
- and scored by a mutation scoring matrix.
145
+ and scored by a mutation scoring matrix using the --translate option.
146
+ Both nucleotide and translated outputs can then be summarized by feature
147
+ using a set of helper functions:
148
+
149
+ ``` {bash, eval=FALSE}
150
+ dantools summarize-nuc labeled_nucleotides.tsv
151
+ dantools summarize-aa labeled_aa.tsv
152
+ ```
153
+
154
+ For alignment of dantools fragment/pseudogen produced genome fragments,
155
+ translocation events can also be investigated:
156
+
157
+ ``` {bash, eval=FALSE}
158
+ dantools transloc alignment.sam
159
+ ```
136
160
137
- Additional functionalities exist and are listed with the simple command:
161
+ A list of DanTools functions can be produced with the simple command:
138
162
139
163
``` {bash, eval=FALSE}
140
164
dantools
0 commit comments