Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup from readthough #1241

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 20 additions & 16 deletions CITATION
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
.. vim: set filetype=rst

.. If you update this file then you may need to update the citations in
scripts/galaxy/macro.xml and khmer/khmer_args.py as well
khmer/khmer_args.py as well

*********
Citations
---------
*********

Software Citation
^^^^^^^^^^^^^^^^^
=================

If you use the khmer software, you must cite:

Expand Down Expand Up @@ -38,10 +39,11 @@ To see a quick summary of papers for a given script just run it without using
any command line arguments.

Graph partitioning and/or compressible graph representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
===========================================================

The load-graph.py, partition-graph.py, find-knots.py, load-graph.py,
and partition-graph.py scripts are part of the compressible graph
The :program:`load-graph.py`, :program:`partition-graph.py`,
:program:`find-knots.py`, :program:`load-graph.py`, and
:program:`partition-graph.py` scripts are part of the compressible graph
representation and partitioning algorithms described in:

Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT.
Expand Down Expand Up @@ -84,10 +86,10 @@ representation and partitioning algorithms described in:
}

Digital normalization
^^^^^^^^^^^^^^^^^^^^^
=====================

The normalize-by-median.py and count-median.py scripts are part of
the digital normalization algorithm, described in:
The :program:`normalize-by-median.py` and :program:`count-median.py` scripts
are part of the digital normalization algorithm, described in:

A Reference-Free Algorithm for Computational Normalization of
Shotgun Sequencing Data
Expand All @@ -108,9 +110,9 @@ the digital normalization algorithm, described in:
}

Efficient k-mer error trimming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
==============================

The script trim-low-abund.py is described in:
The :program:`script trim-low-abund.py` is described in:

Crossing the streams: a framework for streaming analysis of short DNA
sequencing reads
Expand All @@ -121,17 +123,19 @@ The script trim-low-abund.py is described in:

@unpublished{semistream,
author = "Qingpeng Zhang and Sherine Awad and C. Titus Brown",
title = "Crossing the streams: a framework for streaming analysis of short DNA sequencing reads",
title = "Crossing the streams: a framework for streaming analysis of
short DNA sequencing reads",
year = "2015",
eprint = "PeerJ Preprints 3:e1100",
url = "https://dx.doi.org/10.7287/peerj.preprints.890v1"
}

K-mer counting
^^^^^^^^^^^^^^
==============

The abundance-dist.py, filter-abund.py, and load-into-counting.py scripts
implement the probabilistic k-mer counting described in:
The :program:`abundance-dist.py`, :program:`filter-abund.py`, and
:program:`load-into-counting.py` scripts implement the probabilistic k-mer
counting described in:

These Are Not the K-mers You Are Looking For: Efficient Online K-mer
Counting Using a Probabilistic Data Structure
Expand Down Expand Up @@ -179,7 +183,7 @@ implement the probabilistic k-mer counting described in:
}

FASTA and FASTQ reading
^^^^^^^^^^^^^^^^^^^^^^^
=======================

Several scripts use the SeqAn library for FASTQ and FASTA reading as described
in:
Expand Down
12 changes: 8 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# and documentation
# make coverage-report to check coverage of the python scripts by the tests

CPPSOURCES=$(wildcard lib/*.cc lib/*.hh khmer/_khmer.cc)
CPPSOURCES=$(wildcard lib/*.cc lib/*.hh khmer/_khmer.cc) setup.py
PYSOURCES=$(wildcard khmer/*.py scripts/*.py)
SOURCES=$(PYSOURCES) $(CPPSOURCES) setup.py
DEVPKGS=pep8==1.5.7 diff_cover autopep8 pylint coverage gcovr nose pep257 \
Expand Down Expand Up @@ -80,22 +80,26 @@ clean: FORCE
rm -f diff-cover.html

debug: FORCE
export CFLAGS="-pg -fprofile-arcs"; python setup.py build_ext --debug \
export CFLAGS="-pg -fprofile-arcs -D_GLIBCXX_DEBUG_PEDANTIC \
-D_GLIBCXX_DEBUG"; python setup.py build_ext --debug \
--inplace

## doc : render documentation in HTML
doc: build/sphinx/html/index.html

build/sphinx/html/index.html: $(SOURCES) $(wildcard doc/*.txt) doc/conf.py all
build/sphinx/html/index.html: $(SOURCES) $(wildcard doc/*.rst) doc/conf.py all
./setup.py build_sphinx --fresh-env
@echo ''
@echo '--> docs in build/sphinx/html <--'
@echo ''

## pdf : render documentation as a PDF file
# packages needed include: texlive-latex-recommended,
# texlive-fonts-recommended, texlive-latex-extra
pdf: build/sphinx/latex/khmer.pdf

build/sphinx/latex/khmer.pdf: $(SOURCES) doc/conf.py $(wildcard doc/*.txt)
build/sphinx/latex/khmer.pdf: $(SOURCES) doc/conf.py $(wildcard doc/*.rst) \
$(wildcard doc/user/*.rst) $(wildcard doc/dev/*.rst)
./setup.py build_sphinx --fresh-env --builder latex
cd build/sphinx/latex && ${MAKE} all-pdf
@echo ''
Expand Down
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@
#html_additional_pages = {}

# If false, no module index is generated.
#html_use_modindex = True
html_use_modindex = False

# If false, no index is generated.
#html_use_index = True
Expand Down Expand Up @@ -227,4 +227,4 @@
#latex_appendices = []

# If false, no module index is generated.
#latex_use_modindex = True
latex_use_modindex = False
15 changes: 8 additions & 7 deletions doc/contributors.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
.. vim: set filetype=rst

=================================
*********************************
Contributors and Acknowledgements
=================================
*********************************

khmer is a product of the GED lab at Michigan State University,
khmer is a product of the Lab for Data Intensive Biology at the University of
California, Davis (the succesor to the GED lab at Michigan State University),

http://ged.msu.edu/
http://ivory.idyll.org/lab/

---

C. Titus Brown <ctb@msu.edu> wrote the initial ktable and hashtable
C. Titus Brown <titus@idyll.org> wrote the initial ktable and hashtable
implementations, as well as hashbits and counting_hash.

Jason Pell implemented many of the C++ k-mer filtering functions.
Expand All @@ -28,6 +29,6 @@ Eric McDonald thoroughly revised many aspects of the code base, made
much of the codebase thread safe, and otherwise improved performance
dramatically.

Michael R. Crusoe is the new maintainer of khmer.
Michael R. Crusoe took over maintainership June, 2013.

MRC 2014-05-07
Last updated by MRC on 2015-07-31
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to expand this section to include all other contributors? I assume not, but it might be worth distinguishing between contributours here and authors elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a TODO for @ctb

7 changes: 6 additions & 1 deletion doc/dev/coding-guidelines-and-review.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@ Coding guidelines and code review checklist
This document is for anyone who want to contribute code to the khmer
project, and describes our coding standards and code review checklist.

----
C++ standards
-------------

Any feature in C++11 is fine to use. Specifically we support features found in
GCC 4.8.2. See https://github.com/dib-lab/khmer/issues/598 for an in-depth
discussion.

Coding standards
----------------
Expand Down
10 changes: 10 additions & 0 deletions doc/dev/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,16 @@ One-time Preparation
sudo brew install cppcheck


#. ccache installation:

Debian and Ubuntu Linux distro users can install ``ccache`` to speed up
their compile times::

sudo apt-get install ccache
echo 'export PATH="/usr/lib/ccache:$PATH" # enable ccache' >> ~/.bashrc
export PATH="/usr/lib/ccache:$PATH"


Building khmer and running the tests
------------------------------------

Expand Down
36 changes: 23 additions & 13 deletions doc/index.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,26 @@
.. khmer documentation master file, created by
sphinx-quickstart on Wed Aug 4 10:20:23 2010.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. vim: set filetype=rst

#######################################
khmer -- k-mer counting & filtering FTW
=======================================
#######################################

:Authors: Michael R. Crusoe, ACharbonneau, James A. Stapleton, Sherine
Awad, Elmar Bucher, Adam Caldwell, Reed Cartwright, Bede Constantinides,
Peter Dave Hello, Kevin D. Murray, Greg Edvenson, Hussien F. Alameldin,
Scott Fay, Jacob Fenton, Thomas Fenzl, Jordan Fish, Leonor
Garcia-Gutierrez, Phillip Garland, Jonathan Gluck, Iván González, Sarah
Guermond, Jiarong Guo, Aditi Gupta, Andreas Härpfer, Adina Howe,
Alex Hyer, Luiz Irber, Alexander Johan Nederbragt, Rhys Kidd, David Lin,
Justin Lippi, Heather L. Wiencko, Tamer Mansour, Pamela McA'Nulty, Eric
McDonald, Jessica Mizzi, Kevin Murray, Kaben Nanlohy, Humberto
Ortiz-Zuazaga, Jeramia Ory, Jason Pell, Charles Pepe-Ranney, Rodney
Picett, Ryan R. Boyce, Michael R. Crusoe, Joshua R. Herr, Joshua R.
Nahum, Erich Schwarz, Camille Scott, Josiah Seaman, Scott Sievert, Jared
Simpson, James Spencer, Ramakrishnan Srinivasan, Daniel Standage, Joe
Stein, Susan Steinman, Benjamin Taylor, C. Titus Brown, Will Trimble,
Connor T. Skennerton, Michael Wright, Brian Wyss, Qingpeng Zhang, en
zyme, C. Titus Brown

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwoah, thanks!

:Authors: Michael R. Crusoe, Greg Edvenson, Jordan Fish, Adina Howe,
Luiz Irber, Eric McDonald, Joshua Nahum, Kaben Nanlohy, Humberto
Ortiz-Zuazaga, Jason Pell, Jared Simpson, Camille Scott, Ramakrishnan
Rajaram Srinivasan, Qingpeng Zhang, and C. Titus Brown

:Contact: khmer-project@idyll.org
:GitHub: https://github.com/dib-lab/khmer
Expand All @@ -18,7 +29,7 @@ khmer -- k-mer counting & filtering FTW


khmer is a library and suite of command line tools for working with
DNA sequence. It is primarily aimed at short-read sequencing data
DNA sequences. It is primarily aimed at short-read sequencing data
such as that produced by the Illumina platform. khmer takes a k-mer-centric
approach to sequence analysis, hence the name.

Expand All @@ -34,7 +45,8 @@ the following URLs:

* Announcements: http://lists.idyll.org/listinfo/khmer-announce

The archives for the khmer list are available at: http://lists.idyll.org/pipermail/khmer/
The archives for the khmer mailing list are available at:
http://lists.idyll.org/pipermail/khmer/

khmer development has largely been supported by AFRI Competitive Grant
no. `2010-65205-20361
Expand All @@ -44,8 +56,6 @@ Institute of the National Institutes of Health under Award Number
`R01HG007513 <http://ged.msu.edu/downloads/2012-bigdata-nsf.pdf>`__ through
May 2016, both to C. Titus Brown.

Contents:

.. toctree::
:maxdepth: 1

Expand Down
22 changes: 12 additions & 10 deletions doc/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. vim: set filetype=rst

=====================
*********************
Introduction to khmer
=====================
*********************

Introduction
============
Expand All @@ -11,7 +11,7 @@ khmer is a library and toolkit for doing k-mer-based dataset analysis and
transformations. Our focus in developing it has been on scaling assembly of
metagenomes and mRNA.

khmer can be used for a number of transformations, include inexact
khmer can be used for a number of transformations, including inexact
transformations (abundance filtering and error trimming) and exact
transformations (graph-size filtering, to throw away disconnected reads; and
partitioning, to split reads into disjoint sets). Of these, only partitioning
Expand All @@ -34,16 +34,16 @@ will never incorrectly report a k-mer as being absent when it *is* present.
This one-sided error makes the Bloom filter very useful for certain kinds of
operations.

khmer is also independent of K, and currently works for K <= 32. We will be
integrating code for up to K=64 soon.
khmer is also independent of a specific k-size (K), and currently works for
K <= 32. We will be integrating code for K<=64 soon.

khmer is implemented in C++ with a Python wrapper, which is what all of the
scripts use.

Some important documentation for khmer is provided on the Web sites for
Documentation for khmer is provided on the Web sites for
`khmer-protocols <http://khmer-protocols.readthedocs.org>`__ and `khmer-recipes
<http://khmer-recipes.readthedocs.org>`__. khmer-protocols provides detailed
protocols for using khmer to analyze either a transcriptome or a metagenome;
protocols for using khmer to analyze either a transcriptome or a metagenome.
khmer-recipes provides individual recipes for using khmer in a variety of
sequence-oriented tasks such as extracting reads by coverage, estimating a
genome or metagenome size from unassembled reads, and error-trimming reads via
Expand Down Expand Up @@ -71,7 +71,7 @@ immediately useful for a few different operations, including:

- optimizing assemblies on various parameters;

- converting FASTA to FASTQ;
- converting FASTQ to FASTA;

and a few other random functions.

Expand All @@ -94,6 +94,8 @@ Copyright and license
=====================

Portions of khmer are Copyright California Institute of Technology,
where the exact counting code was first developed; the remainder is
Copyright Michigan State University. The code is freely available for
where the exact counting code was first developed. All other code developed
through 2014 is copyright Michigan State University. All developed code through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portions are Copyright Michigan State University and Copyright Regents of the University of California.

2015 is copyright University of California Davis.
All the code is freely available for
use and re-use under the BSD License.
25 changes: 19 additions & 6 deletions doc/user/biblio.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,28 @@
An incomplete bibliography of papers using khmer
================================================


Biological uses outside of the group
------------------------------------

http://www.ncbi.nlm.nih.gov/sites/myncbi/1ruvipqAmaMkN/collections/48107393/public/

Tools building on khmer concepts
--------------------------------

http://www.ncbi.nlm.nih.gov/sites/myncbi/1ruvipqAmaMkN/collections/48101567/public/

Papers in collaboration with our group
--------------------------------------

http://www.ncbi.nlm.nih.gov/sites/myncbi/1ruvipqAmaMkN/collections/48107445/public/

Digital normalization
---------------------

Multiple Single-Cell Genomes Provide Insight into Functions of
Uncultured Deltaproteobacteria in the Human Oral Cavity. Campbell et
al., PLoS One, 2013, doi:10.1371/journal.pone.0059361. [ `paper link <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0059361>`__ ]
al., PLoS One, 2013, doi:10.1371/journal.pone.0059361. [ `paper link
<http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0059361>`__ ]


Insights into archaeal evolution and symbiosis from the genomes of a
nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool,
Yellowstone National Park. Podar et al., Biology Direct, 2013
doi:10.1186/1745-6150-8-9.
[ `paper link <http://www.biology-direct.com/content/8/1/9/abstract>`__ ]
Loading