elsayed-lab
diff --git a/‎README.md
Lines changed: 43 additions & 19 deletions b/‎README.md
Lines changed: 43 additions & 19 deletions
diff --git a/‎bin/Dantools.pl
Lines changed: 9 additions & 3 deletions b/‎bin/Dantools.pl
Lines changed: 9 additions & 3 deletions
diff --git a/‎helpdocs/dantools.help
Lines changed: 1 addition & 1 deletion b/‎helpdocs/dantools.help
Lines changed: 1 addition & 1 deletion
diff --git a/‎helpdocs/transloc.help
Lines changed: 6 additions & 3 deletions b/‎helpdocs/transloc.help
Lines changed: 6 additions & 3 deletions
@@ -1,12 +1,12 @@
-# Dantools
+# DanTools
 
 ## Contact
 
 Daniel Klimes (<daniel.s.klimes@gmail.com>)
 
 ## Overview
 
-Dantools is a tool used for comparing genome sequences between
+DanTools is a tool used for comparing genome sequences between
 divergent organisms, such as different species. It does this by
 creating a pseudogenome, a modification of a reference genome to match
 the sequence of another. In doing so, a list of genetic variants
@@ -16,13 +16,13 @@ by keeping track of position shifts from indels, the annotations of
 the original reference can be extended to the new species'
 genome. This allows for the alignment of RNA-Seq data from divergent
 species against a set of genomes with unified annotations. Critically,
-Dantools does not require either genome to be well assembled, and can
+DanTools does not require either genome to be well assembled, and can
 even accept an RNA-Seq .fastq input.
 
 ## Methodology
 
-Dantools in its base form accepts two genomes as input: the base and
-the source. In its first step, Dantools breaks the source genome into
+DanTools in its base form accepts two genomes as input: the base and
+the source. In its first step, DanTools breaks the source genome into
 sequence "fragments", 50-10,000 base pieces of the genome and its
 reverse complement. These fragments are then aligned against the base
 genome using HISAT2 with lowered alignment stringency. Freebayes is
@@ -37,12 +37,13 @@ relative position within features and flanking regions. In the final
 step, the input GFF (if provided) is then shifted to accomodate indel
 mutations that were created in the process of modification.
 
-Because of the flexibility of inputs, Dantools can accept RNA-Seq
+Because of the flexibility of inputs, DanTools can accept RNA-Seq
 reads in fastq format and minimally-assembled genomes.
 
-In future iterations of Dantools, functionality will be added for:
--Determining genome rearrangements
--Translating nucleic acid to amino acid changes
+DanTools additionally provides various helper functions used to
+label variants relative to features, translate VCF files into
+amino acids, predict translocations, and summarize variant
+information (see Usage)
 
 ## Installation
 
@@ -54,12 +55,12 @@ dantools in a self-contained tree with the github-downloaded source in
 src/
 
 ```{bash, eval=FALSE}
-mkdir -p dantools/202407
-cd dantools/202406
+mkdir -p dantools/202408
+cd dantools/202408
 git clone https://github.com/elsayed-lab/dantools.git
 mv dantools src
 ## tell MakeMaker and Module::Build to use this tree
-## All prerequisite perl libraries will go to dantools/202406/lib/perl5
+## All prerequisite perl libraries will go to dantools/202408/lib/perl5
 export PERL5LIB="$(pwd)/lib/perl5:${PERL5LIB}"
 export PERL_MM_OPT="INSTALL_BASE=$(pwd)"
 export PERL_MB_OPT="--install_base $(pwd)"
@@ -74,7 +75,7 @@ cpanm Moo::Role Parallel::ForkManager
 make && make install
 ```
 
-Dantools relies on a set of software packages not listed in the
+DanTools relies on a set of software packages not listed in the
 makefile to perform some of its tasks:
 
 - [HISAT2](https://github.com/DaehwanKimLab/hisat2)
@@ -119,22 +120,45 @@ process, the base genome features need to be shifted in their relative
 positions:
 
 ```{bash, eval=FALSE}
-dantools shift -v variants.vcf -f base.gff
+dantools shift -o shifted.gff -v variants.vcf -f base.gff
 ```
 
-Now, RNA-Seq data aligned against the pseudogenome can be counted with
-the shifted GFF file.
+As an estimate for how well fragments/reads aligned to certain
+features, the alignment depth can be summarized over features:
 
-If one wants to label the variants according to a GFF file:
+```{bash, eval=FALSE}
+dantools summarize-depth -f shifted.gff -d depth.tsv --feature gene
+```
+
+Now, any RNA-Seq data aligned against the pseudogenome can be counted with
+the shifted GFF file. It is recommended features with low fragment
+alignment depth be removed as comparisons there were likely less accurate
+
+One can also perform a variety of variant analyses with DanTools. If one
+wants to label the variants according to a GFF file:
 
 ```{bash, eval=FALSE}
 dantools label -v variants.vcf -f base.gff --features five_prime_UTR,CDS,three_prime_UTR
 ```
 
 These variants can optionally be translated into amino acid changes
-and scored by a mutation scoring matrix.
+and scored by a mutation scoring matrix using the --translate option.
+Both nucleotide and translated outputs can then be summarized by feature
+using a set of helper functions:
+
+```{bash, eval=FALSE}
+dantools summarize-nuc labeled_nucleotides.tsv
+dantools summarize-aa labeled_aa.tsv
+```
+
+For alignment of dantools fragment/pseudogen produced genome fragments,
+translocation events can also be investigated:
+
+```{bash, eval=FALSE}
+dantools transloc alignment.sam
+```
 
-Additional functionalities exist and are listed with the simple command:
+A list of DanTools functions can be produced with the simple command:
 
 ```{bash, eval=FALSE}
 dantools
 
@@ -373,8 +373,10 @@ BEGIN
 
     my $input_sam;
     my $output;
-    my $gap_thresh = 10000;
+    my $ref_gap = 1000;
+    my $query_gap = 1000;
     my $min_qbase = 10000;
+    my $max_diff = 50; #% difference in length between ref and query
     my $min_depth = 1;
     my $min_length = 1;
     my $input_vcf = 0;
@@ -383,10 +385,12 @@ BEGIN
     GetOptions(
         "sam|s=s" => \$input_sam,
         "output|o=s" => \$output,
-        "gap-thresh|g=i" => \$gap_thresh,
+        "ref-gap|r=i" => \$ref_gap,
+        "query-gap|q=i" => \$query_gap,
         "min-qbase|b=i" => \$min_qbase,
         "min-depth|d=i" => \$min_depth,
         "min-length|l=i" => \$min_length,
+        "max-diff=i" => \$max_diff,
         "vcf|v=s" => \$input_vcf,
         "all|a" => \$all,
         "help|h" => \$help
@@ -421,7 +425,9 @@ BEGIN
     Bio::Dantools::transloc(
         input_sam => "$input_sam",
         output => "$output",
-        gap_thresh => "$gap_thresh",
+        ref_gap => "$ref_gap",
+        query_gap => "$query_gap",
+        max_diff => "$max_diff",
         min_qbase => "$min_qbase",
         min_depth => "$min_depth",
         min_length => "$min_length",
 
@@ -1,4 +1,4 @@
-Dantools Version 1.0.0 by Daniel Klimes (daniel.s.klimes@gmail.com)
+Dantools Version 1.1.0 by Daniel Klimes (daniel.s.klimes@gmail.com)
 Tools for the Comparison of Disparate Genomes
 
 Usage: dantools <command> [options]
 
@@ -3,9 +3,12 @@ Reads a SAM alignment of fragments (from dantools fragment) to predict
 translocations
 
 Options:
-  -g, --gap-thresh     Maximum base distance between alignments to
-                       consider them the same translocation
-                       (default: 10000)
+  -r, --ref-gap        Maximum base distance between alignments
+                       relative to reference to group them together
+                       (default: 1000)
+  -q, --query-gap      Maximum base distance between alignments
+                       relative to query to group them together
+                       (default: 1000)
   -b, --min-qbase      Minimum number of read bases (length * number)
                        necessary to consdier translocation
                        (default: 10000)
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-Dantools Version 1.0.0 by Daniel Klimes (daniel.s.klimes@gmail.com)`
	`1`	`+Dantools Version 1.1.0 by Daniel Klimes (daniel.s.klimes@gmail.com)`
`2`	`2`	`Tools for the Comparison of Disparate Genomes`
`3`	`3`
`4`	`4`	`Usage: dantools <command> [options]`