Merge branch 'master' of github.com:ged-lab/khmer into fix/sandbox_sc…

…ripts Conflicts: ChangeLog
dib-lab · Mar 5, 2015 · 05403fb · 05403fb
2 parents a9c4489 + c22aaa3
commit 05403fb
Show file tree

Hide file tree

Showing 17 changed files with 43 additions and 1,020 deletions.
diff --git a/ChangeLog b/ChangeLog
@@ -3,6 +3,18 @@
    * sandbox/{collect-reads.py,saturate-by-median.py}: update for 'force'
    argument in khmer.kfile functions, so that khmer-recipes compile.
 
+2015-03-02  Titus Brown  <titus@idyll.org>
+
+   * sandbox/{combine-pe.py,compare-partitions.py,count-within-radius.py,
+   degree-by-position.py,dn-identify-errors.py,ec.py,error-correct-pass2.py,
+   find-unpart.py,normalize-by-align.py,read-aligner.py,shuffle-fasta.py,
+   to-casava-1.8-fastq.py,uniqify-sequences.py}: removed from sandbox/ as
+   obsolete/unmaintained.
+   * sandbox/README.rst: updated to reflect readstats.py and trim-low-abund.py
+   promotion to sandbox/.
+   * doc/dev/scripts-and-sandbox.txt: updated to reflect sandbox/ script name
+   preferences, and note to remove from README.rst when moved over to scripts/.
+
 2015-02-27  Kevin Murray  <spam@kdmurray.id.au>
 
    * scripts/load-into-counting.py: Be verbose in the help text, to clarify
@@ -11,12 +23,13 @@
 2015-02-25  Hussien Alameldin  <hussien@msu.edu>
 
    * sandbox/bloom_count.py: renamed to bloom-count.py
-   * sandbox/bloom_count_intersection.py: renamed to 
+   * sandbox/bloom_count_intersection.py: renamed to
      bloom-count-intersection.py
    * sandbox/read_aligner.py: renamed to read-aligner.py
 
 2015-02-26  Tamer A. Mansour  <drtamermansour@gmail.com>
 
+   * scripts/abundance-dist-single.py: Use CSV format for the histogram.
    * scripts/count-overlap.py: Use CSV format for the curve file output.
    Includes column headers.
    * scripts/abundance-dist-single.py: Use CSV format for the histogram. 

diff --git a/doc/dev/scripts-and-sandbox.txt b/doc/dev/scripts-and-sandbox.txt
@@ -41,6 +41,8 @@ All scripts in ``sandbox/`` must:
 * have a hash-bang line (``#! /usr/bin/env python2``) at the top
 * be command-line executable (``chmod a+x``)
 * have a Copyright message (see below)
+* have lowercase names
+* use '-' as a word separator, rather than '_' or CamelCase
 
 All *new* scripts being added to ``sandbox/`` should:
 
@@ -113,3 +115,4 @@ development/PR checklist::
    - [ ] standard command line options are implemented
    - [ ] version and citation information is output to STDERR (`khmer_args.info(...)`)
    - [ ] runtime diagnostic information (progress, etc.) is output to STDERR
+   - [ ] script has been removed from sandbox/README.rst
diff --git a/sandbox/README.rst b/sandbox/README.rst
@@ -6,10 +6,16 @@ scripts that we have not fully tested.  They are also not under
 semantic versioning, so their functionality and command line arguments
 may change without notice.
 
-We are still in the middle of triaging and documenting the various scripts.
+We are still triaging and documenting the various scripts.
 
 ----
 
+Awaiting promotion to sandbox:
+
+* calc-error-profile.py - calculate a per-base "error profile" for shotgun sequencing data, w/o a reference. (Used/tested in `2014 paper on semi-streaming algorithms <https://github.com/ged-lab/2014-streaming/blob/master/>`__)
+* correct-errors.py - streaming error correction.
+* unique-kmers.py - estimate the number of k-mers present in a file with the HyperLogLog low-memory probabilistic cardinality estimation algorithm.
+
 Scripts with recipes:
 
 * calc-median-distribution.py - plot coverage distribution; see `khmer-recipes #1 <https://github.com/ged-lab/khmer-recipes/tree/master/001-extract-reads-by-coverage>`__
@@ -21,13 +27,9 @@ To keep, document, and build recipes for:
 
 * abundance-hist-by-position.py - look at abundance of k-mers by position within read; use with fasta-to-abundance-hist.py
 * assemstats3.py - print out assembly statistics
+* build-sparse-graph.py - code for building a sparse graph (by Camille Scott)
 * calc-best-assembly.py - calculate the "best assembly" - used in metagenome protocol
-* calc-error-profile.py - calculate a per-base "error profile" for shotgun sequencing data, w/o a reference.
-* calc-median-distribution.py - plot coverage distribution; see `khmer-recipes #1 <https://github.com/ged-lab/khmer-recipes/tree/master/001-extract-reads-by-coverage>`__
-* collect-variants.py
-* combine-pe.py - combine partitions based on shared PE reads.
-* compare-partitions.py
-* dn-identify-errors.py - prototype script to identify errors in reads based on diginorm principles
+* collect-variants.py - used in a `gist <https://gist.github.com/ctb/6eaef7971ea429ab348d>`__
 * extract-single-partition.py - extract all the sequences that belong to a specific partition, from a file with multiple partitions
 * fasta-to-abundance-hist.py - generate abundance of k-mers by position within reads; use with abundance-hist-by-position.py
 * filter-below-abund.py - like filter-abund, but trim off high-abundance k-mers
@@ -41,9 +43,7 @@ To keep, document, and build recipes for:
 * normalize-by-median-pct.py - see blog post on Trinity in silico norm (http://ivory.idyll.org/blog/trinity-in-silico-normalize.html)
 * print-stoptags.py - print out the stoptag k-mers
 * print-tagset.py - print out the tagset k-mers
-* readstats.py - print out read statistics
 * renumber-partitions.py - systematically renumber partitions
-* shuffle-fasta.py - FASTA file shuffler for small FASTA files
 * shuffle-reverse-rotary.py - FASTA file shuffler for larger FASTA files
 * split-fasta.py - break a FASTA file up into smaller chunks
 * stoptag-abundance-hist.py - print out abundance histogram of stoptags
@@ -55,9 +55,7 @@ To keep, document, and build recipes for:
 * sweep-reads.py - various ways to extract reads based on k-mer overlap
 * sweep-reads2.py - various ways to extract reads based on k-mer overlap
 * sweep-reads3.py - various ways to extract reads based on k-mer overlap
-* to-casava-1.8-fastq.py - convert reads to different Casava format
-* trim-low-abund.py - streaming k-mer abundance trimming; see filter-abund for non-streaming, and look to `khmer-recipes #6 <https://github.com/ged-lab/khmer-recipes/blob/master/006-streaming-sequence-trimming/index.rst>`__ for usage.
-* write-trimmomatic.py
+* write-trimmomatic.py - used to build Trimmomatic command lines in `khmer-protocols <http://khmer-protocols.readthedocs.org/en/latest/>`__
 
 Good ideas to rewrite using newer tools/approaches:
 
@@ -67,20 +65,24 @@ Good ideas to rewrite using newer tools/approaches:
 * bloom-count-intersection.py - look at unique and disjoint #s of k-mers, Renamed from bloom_count_intersection.py in commit 4788c31.
 * split-sequences-by-length.py - break up short reads by length
 
-To examine:
+----
 
-* build-sparse-graph.py - code for building a sparse graph (by Camille Scott)
-* count-within-radius.py - calculating graph density by position with seq
-* degree-by-position.py - calculating graph degree by position in seq
-* ec.py - new error correction foo
-* error-correct-pass2.py - new error correction foo
-* find-unpart.py - something to do with finding unpartitioned sequences
-* normalize-by-align.py  - new error correction foo
-* read-aligner.py - new error correction foo, Renamed from read_aligner.py in commit 4788c31.
-* uniqify-sequences.py - print out paths that are unique in the graph
-* write-interleave.py - is this used by any protocol etc?
+Present in commit d295bc847 but removed thereafter:
 
-----
+* `combine-pe.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/combine-pe.py>`__ - combine partitions based on shared PE reads.
+* `compare-partitions.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/compare-partitions.py>`__ - compare read membership in partitions.
+* `count-within-radius.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/count-within-radius.py>`__ - calculating graph density by position with seq
+* `degree-by-position.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/degree-by-position.py>`__ - calculating graph degree by position in seq
+* `dn-identify-errors.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/dn-identify-errors.py>`__ - prototype script to identify errors in reads based on diginorm principles
+* `ec.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/ec.py>`__ - new error correction foo
+* `error-correct-pass2.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/error-correct-pass2.py>`__ - new error correction foo
+* `find-unpart.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/find-unpart.py>`__ - something to do with finding unpartitioned sequences
+* `normalize-by-align.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/normalize-by-align.py>`__  - new error correction foo
+* `read_aligner.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/read_aligner.py>`__ - new error correction foo
+* `shuffle-fasta.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/shuffle-fasta.py>`__ - FASTA file shuffler for small FASTA files
+* `to-casava-1.8-fastq.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/to-casava-1.8-fastq.py>`__ - convert reads to different Casava format
+* `uniqify-sequences.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/uniqify-sequences.py>`__ - print out paths that are unique in the graph
+* `write-interleave.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/write-interleave.py>`__ - is this used by any protocol etc?
 
 Present in commit 691b0b3ae but removed thereafter:
 

diff --git a/sandbox/combine-pe.py b/sandbox/combine-pe.py
diff --git a/sandbox/compare-partitions.py b/sandbox/compare-partitions.py
diff --git a/sandbox/count-within-radius.py b/sandbox/count-within-radius.py
diff --git a/sandbox/degree-by-position.py b/sandbox/degree-by-position.py