Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --max-memory-usage argument, and associated khmer_args refactoring. #1050

Merged
merged 36 commits into from
Jun 27, 2015

Conversation

ctb
Copy link
Member

@ctb ctb commented May 31, 2015

This pull request should make little practical difference in behavior, but does several major refactors:

  • first, we introduce -M/--max-memory-usage which is a single parameter to automatically set
    graph sizes;
  • second, as part of this, we change the way that hashtable sizes are calculated so that prime numbers
    below the specified number are chosen. This necessitated many small changes to tests that
    relied on specific behavior and numbers.
  • third, I made CountingHash take the same args as Hashbits, and put in the same new hack in
    Python-land to subclass the CPython class _CountingHash;
  • fourth, while we were changing so many things anyway, I went ahead and replaced
    new_counting_hash and new_hashbits with CountingHash and Hashbits.
  • fifth, since test_hashbits and test_hashbits_obj now contained basically the same tests, I merged them
    and then removed test_hashbits_obj.

and of course many minor adjustments throughout.

  • add -M/--max-memory-usage parameter in khmer_args.py (try 2 to address User experience upgrades for khmer #732 (comment); try 1 is max memory parameter specification  #744)
  • start salting in nodegraph and countgraph terminology at Python level, as convenient (ref Rename counting_hash and hashbits at Python layer #872)
  • fix up all of the scripts to use create_nodegraph/create_countgraph functions from khmer_args.py
  • add unit tests for create_nodegraph and create_countgraph
  • add tests for -M parameter in scripts/
  • change check_space_for_hashtable to use 'args' rather than 'min_tablesize'
  • then, run test_scripts after renaming the min_tablesize parameter to see what else we need to change
  • change new_counting_hash over to CountingHash, and new_hashbits over to Hashbits throughout
  • check basic behavior of sandbox/ scripts with recipes and/or manual tests
  • update docs
  • update whatsnew for 2.0
  • adjust -M behavior for special scripts like abundance-dist, which use 1.1x the normal memory.

@ctb
Copy link
Member Author

ctb commented May 31, 2015

Fixes #980 in passing.

@ctb
Copy link
Member Author

ctb commented May 31, 2015

This is one for @mr-c to gaze upon. So far, I am happy with the direction it is taking.

@ctb ctb force-pushed the update/memory-args2 branch from 98da767 to bce7b0a Compare May 31, 2015 15:07
@ctb ctb added this to the 2.0 milestone May 31, 2015
@ctb
Copy link
Member Author

ctb commented Jun 5, 2015

I think I will wait on @mr-c to take a look-see before continuing with fixing up the documentation etc. etc.

…args2

Conflicts:
	ChangeLog
	khmer/__init__.py
	khmer/khmer_args.py
	oxli/build_graph.py
	sandbox/collect-reads.py
	sandbox/saturate-by-median.py
	scripts/abundance-dist-single.py
	scripts/do-partition.py
	scripts/filter-abund-single.py
	scripts/load-into-counting.py
	scripts/normalize-by-median.py
	scripts/trim-low-abund.py
	tests/test_script_arguments.py
counting hash/hashbits creation with new khmer_args create* functions.
* oxli/build_graph.py,sandbox/{sweep-files.py,sweep-reads.py},
scripts/{count-overlap.py,do-partition.py,},khmer/khmer_args.py:
changed hashtype over to 'nodegraph' and 'bitgraph' in call to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bitgraph/countgraph/

@ctb
Copy link
Member Author

ctb commented Jun 14, 2015

ARGH this one is going down the rabbit hole... fine, I'll just do it all, then.

* make CountingHash a Python wrapper around _CountingHash;
* replace new_counting_hash with CountingHash through codebase;
* replace new_hashbits with Hashbits throughout codebase;

Still some tests failing, though.
@mr-c
Copy link
Contributor

mr-c commented Jun 15, 2015

Well, all of these inter-related issues provide a lot of value and are a long time coming :-)

@ctb
Copy link
Member Author

ctb commented Jun 23, 2015 via email

@ctb
Copy link
Member Author

ctb commented Jun 24, 2015

test this please, jenkins.

@SensibleSalmon
Copy link
Contributor

retest this, please

@ctb
Copy link
Member Author

ctb commented Jun 26, 2015

Ready for review and merge @mr-c.

@ctb
Copy link
Member Author

ctb commented Jun 26, 2015

Note: we will probably need to redo things in doc/user/scripts.rst and choosing-hash-sizes a bit once #1106 lands; please evaluate current doc updates in this light :)

New parameter for tablesize/number of table parameters.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is now a `-M/--max-memory-usage` parameter that sets the number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the style

:option:`-M`/:option:`--max-memory-usage`

throughout this document.

@mr-c
Copy link
Contributor

mr-c commented Jun 27, 2015

Great! I'm going to fix the two small nits I found and merge for you.

@ctb
Copy link
Member Author

ctb commented Jun 27, 2015

fixed the doc one

On Fri, Jun 26, 2015 at 05:20:09PM -0700, Michael R. Crusoe wrote:

Great! I'm going to fix the two small nits I found and merge for you.


Reply to this email directly or view it on GitHub:

#1050 (comment)

C. Titus Brown, ctbrown@ucdavis.edu

@mr-c
Copy link
Contributor

mr-c commented Jun 27, 2015

sigh Thanks :-P

On Fri, Jun 26, 2015 at 5:21 PM C. Titus Brown notifications@github.com
wrote:

fixed the doc one

On Fri, Jun 26, 2015 at 05:20:09PM -0700, Michael R. Crusoe wrote:

Great! I'm going to fix the two small nits I found and merge for you.


Reply to this email directly or view it on GitHub:

#1050 (comment)

C. Titus Brown, ctbrown@ucdavis.edu


Reply to this email directly or view it on GitHub
#1050 (comment).

Michael R. Crusoe: Programmer & Bioinformatician crusoe@ucdavis.edu
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

mr-c added a commit that referenced this pull request Jun 27, 2015
Add --max-memory-usage argument, and associated khmer_args refactoring.
@mr-c mr-c merged commit b069826 into master Jun 27, 2015
@mr-c mr-c deleted the update/memory-args2 branch June 27, 2015 00:27
@ctb
Copy link
Member Author

ctb commented Jun 27, 2015

lol

On Fri, Jun 26, 2015 at 05:27:33PM -0700, Michael R. Crusoe wrote:

sigh Thanks :-P

On Fri, Jun 26, 2015 at 5:21 PM C. Titus Brown notifications@github.com
wrote:

fixed the doc one

On Fri, Jun 26, 2015 at 05:20:09PM -0700, Michael R. Crusoe wrote:

Great! I'm going to fix the two small nits I found and merge for you.


Reply to this email directly or view it on GitHub:

#1050 (comment)

C. Titus Brown, ctbrown@ucdavis.edu

???
Reply to this email directly or view it on GitHub
#1050 (comment).

Michael R. Crusoe: Programmer & Bioinformatician crusoe@ucdavis.edu
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe


Reply to this email directly or view it on GitHub:

#1050 (comment)

C. Titus Brown, ctbrown@ucdavis.edu

@ctb
Copy link
Member Author

ctb commented Jun 27, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants