Skip to content

Commit

Permalink
Merge branch 'master' of github.com:ged-lab/khmer into fix/sandbox_sc…
Browse files Browse the repository at this point in the history
…ripts

Conflicts:
	ChangeLog
  • Loading branch information
ctb committed Mar 5, 2015
2 parents a9c4489 + c22aaa3 commit 05403fb
Show file tree
Hide file tree
Showing 17 changed files with 43 additions and 1,020 deletions.
15 changes: 14 additions & 1 deletion ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
* sandbox/{collect-reads.py,saturate-by-median.py}: update for 'force'
argument in khmer.kfile functions, so that khmer-recipes compile.

2015-03-02 Titus Brown <titus@idyll.org>

* sandbox/{combine-pe.py,compare-partitions.py,count-within-radius.py,
degree-by-position.py,dn-identify-errors.py,ec.py,error-correct-pass2.py,
find-unpart.py,normalize-by-align.py,read-aligner.py,shuffle-fasta.py,
to-casava-1.8-fastq.py,uniqify-sequences.py}: removed from sandbox/ as
obsolete/unmaintained.
* sandbox/README.rst: updated to reflect readstats.py and trim-low-abund.py
promotion to sandbox/.
* doc/dev/scripts-and-sandbox.txt: updated to reflect sandbox/ script name
preferences, and note to remove from README.rst when moved over to scripts/.

2015-02-27 Kevin Murray <spam@kdmurray.id.au>

* scripts/load-into-counting.py: Be verbose in the help text, to clarify
Expand All @@ -11,12 +23,13 @@
2015-02-25 Hussien Alameldin <hussien@msu.edu>

* sandbox/bloom_count.py: renamed to bloom-count.py
* sandbox/bloom_count_intersection.py: renamed to
* sandbox/bloom_count_intersection.py: renamed to
bloom-count-intersection.py
* sandbox/read_aligner.py: renamed to read-aligner.py

2015-02-26 Tamer A. Mansour <drtamermansour@gmail.com>

* scripts/abundance-dist-single.py: Use CSV format for the histogram.
* scripts/count-overlap.py: Use CSV format for the curve file output.
Includes column headers.
* scripts/abundance-dist-single.py: Use CSV format for the histogram.
Expand Down
3 changes: 3 additions & 0 deletions doc/dev/scripts-and-sandbox.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ All scripts in ``sandbox/`` must:
* have a hash-bang line (``#! /usr/bin/env python2``) at the top
* be command-line executable (``chmod a+x``)
* have a Copyright message (see below)
* have lowercase names
* use '-' as a word separator, rather than '_' or CamelCase

All *new* scripts being added to ``sandbox/`` should:

Expand Down Expand Up @@ -113,3 +115,4 @@ development/PR checklist::
- [ ] standard command line options are implemented
- [ ] version and citation information is output to STDERR (`khmer_args.info(...)`)
- [ ] runtime diagnostic information (progress, etc.) is output to STDERR
- [ ] script has been removed from sandbox/README.rst
50 changes: 26 additions & 24 deletions sandbox/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,16 @@ scripts that we have not fully tested. They are also not under
semantic versioning, so their functionality and command line arguments
may change without notice.

We are still in the middle of triaging and documenting the various scripts.
We are still triaging and documenting the various scripts.

----

Awaiting promotion to sandbox:

* calc-error-profile.py - calculate a per-base "error profile" for shotgun sequencing data, w/o a reference. (Used/tested in `2014 paper on semi-streaming algorithms <https://github.com/ged-lab/2014-streaming/blob/master/>`__)
* correct-errors.py - streaming error correction.
* unique-kmers.py - estimate the number of k-mers present in a file with the HyperLogLog low-memory probabilistic cardinality estimation algorithm.

Scripts with recipes:

* calc-median-distribution.py - plot coverage distribution; see `khmer-recipes #1 <https://github.com/ged-lab/khmer-recipes/tree/master/001-extract-reads-by-coverage>`__
Expand All @@ -21,13 +27,9 @@ To keep, document, and build recipes for:

* abundance-hist-by-position.py - look at abundance of k-mers by position within read; use with fasta-to-abundance-hist.py
* assemstats3.py - print out assembly statistics
* build-sparse-graph.py - code for building a sparse graph (by Camille Scott)
* calc-best-assembly.py - calculate the "best assembly" - used in metagenome protocol
* calc-error-profile.py - calculate a per-base "error profile" for shotgun sequencing data, w/o a reference.
* calc-median-distribution.py - plot coverage distribution; see `khmer-recipes #1 <https://github.com/ged-lab/khmer-recipes/tree/master/001-extract-reads-by-coverage>`__
* collect-variants.py
* combine-pe.py - combine partitions based on shared PE reads.
* compare-partitions.py
* dn-identify-errors.py - prototype script to identify errors in reads based on diginorm principles
* collect-variants.py - used in a `gist <https://gist.github.com/ctb/6eaef7971ea429ab348d>`__
* extract-single-partition.py - extract all the sequences that belong to a specific partition, from a file with multiple partitions
* fasta-to-abundance-hist.py - generate abundance of k-mers by position within reads; use with abundance-hist-by-position.py
* filter-below-abund.py - like filter-abund, but trim off high-abundance k-mers
Expand All @@ -41,9 +43,7 @@ To keep, document, and build recipes for:
* normalize-by-median-pct.py - see blog post on Trinity in silico norm (http://ivory.idyll.org/blog/trinity-in-silico-normalize.html)
* print-stoptags.py - print out the stoptag k-mers
* print-tagset.py - print out the tagset k-mers
* readstats.py - print out read statistics
* renumber-partitions.py - systematically renumber partitions
* shuffle-fasta.py - FASTA file shuffler for small FASTA files
* shuffle-reverse-rotary.py - FASTA file shuffler for larger FASTA files
* split-fasta.py - break a FASTA file up into smaller chunks
* stoptag-abundance-hist.py - print out abundance histogram of stoptags
Expand All @@ -55,9 +55,7 @@ To keep, document, and build recipes for:
* sweep-reads.py - various ways to extract reads based on k-mer overlap
* sweep-reads2.py - various ways to extract reads based on k-mer overlap
* sweep-reads3.py - various ways to extract reads based on k-mer overlap
* to-casava-1.8-fastq.py - convert reads to different Casava format
* trim-low-abund.py - streaming k-mer abundance trimming; see filter-abund for non-streaming, and look to `khmer-recipes #6 <https://github.com/ged-lab/khmer-recipes/blob/master/006-streaming-sequence-trimming/index.rst>`__ for usage.
* write-trimmomatic.py
* write-trimmomatic.py - used to build Trimmomatic command lines in `khmer-protocols <http://khmer-protocols.readthedocs.org/en/latest/>`__

Good ideas to rewrite using newer tools/approaches:

Expand All @@ -67,20 +65,24 @@ Good ideas to rewrite using newer tools/approaches:
* bloom-count-intersection.py - look at unique and disjoint #s of k-mers, Renamed from bloom_count_intersection.py in commit 4788c31.
* split-sequences-by-length.py - break up short reads by length

To examine:
----

* build-sparse-graph.py - code for building a sparse graph (by Camille Scott)
* count-within-radius.py - calculating graph density by position with seq
* degree-by-position.py - calculating graph degree by position in seq
* ec.py - new error correction foo
* error-correct-pass2.py - new error correction foo
* find-unpart.py - something to do with finding unpartitioned sequences
* normalize-by-align.py - new error correction foo
* read-aligner.py - new error correction foo, Renamed from read_aligner.py in commit 4788c31.
* uniqify-sequences.py - print out paths that are unique in the graph
* write-interleave.py - is this used by any protocol etc?
Present in commit d295bc847 but removed thereafter:

----
* `combine-pe.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/combine-pe.py>`__ - combine partitions based on shared PE reads.
* `compare-partitions.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/compare-partitions.py>`__ - compare read membership in partitions.
* `count-within-radius.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/count-within-radius.py>`__ - calculating graph density by position with seq
* `degree-by-position.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/degree-by-position.py>`__ - calculating graph degree by position in seq
* `dn-identify-errors.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/dn-identify-errors.py>`__ - prototype script to identify errors in reads based on diginorm principles
* `ec.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/ec.py>`__ - new error correction foo
* `error-correct-pass2.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/error-correct-pass2.py>`__ - new error correction foo
* `find-unpart.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/find-unpart.py>`__ - something to do with finding unpartitioned sequences
* `normalize-by-align.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/normalize-by-align.py>`__ - new error correction foo
* `read_aligner.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/read_aligner.py>`__ - new error correction foo
* `shuffle-fasta.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/shuffle-fasta.py>`__ - FASTA file shuffler for small FASTA files
* `to-casava-1.8-fastq.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/to-casava-1.8-fastq.py>`__ - convert reads to different Casava format
* `uniqify-sequences.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/uniqify-sequences.py>`__ - print out paths that are unique in the graph
* `write-interleave.py <https://github.com/ged-lab/khmer/blob/d295bc8477022e8c34649f131a2abe333a891d3d/sandbox/write-interleave.py>`__ - is this used by any protocol etc?

Present in commit 691b0b3ae but removed thereafter:

Expand Down
66 changes: 0 additions & 66 deletions sandbox/combine-pe.py

This file was deleted.

68 changes: 0 additions & 68 deletions sandbox/compare-partitions.py

This file was deleted.

60 changes: 0 additions & 60 deletions sandbox/count-within-radius.py

This file was deleted.

47 changes: 0 additions & 47 deletions sandbox/degree-by-position.py

This file was deleted.

Loading

0 comments on commit 05403fb

Please sign in to comment.