Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-safety for hashtable #876

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

camillescott
Copy link
Member

The multithreading PR has gotten unwieldy, so this is part of the process to break it down into more manageable chunks. This set of commits adds an array of locks to Hashtable with are indexed by the hashvalue, allowing a thread to lock only a small block when writing. It also introduces a new function meant specifically for thread-safe counting, to avoid overhead under single-threaded cases and maintain backwards compatibility.

@camillescott
Copy link
Member Author

  • Is it mergeable?
  • Did it pass the tests?
  • If it introduces new functionality in scripts/ is it tested?
    Check for code coverage with make clean diff-cover
  • Is it well formatted? Look at make pep8, make diff_pylint_report,
    make cppcheck, and make doc output. Use make format and manual
    fixing as needed.
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Is it documented in the ChangeLog?
    http://en.wikipedia.org/wiki/Changelog#Format
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@camillescott
Copy link
Member Author

Jenkins, test this please, or we'll be having another stern talk.

@camillescott
Copy link
Member Author

You ready to test this please, Jenkins?

@camillescott
Copy link
Member Author

@mr-c @ctb @luizirber thoughts? is the testing robust enough? does a separate function (count_ts) make sense?

note: i've also tested this by monkeypatching count_ts as count and running all tests; it works perfectly.

@camillescott
Copy link
Member Author

Have you been tippin' the bottle again? Jenkins, test this please!

@@ -1541,10 +1618,14 @@ static PyMethodDef khmer_counting_methods[] = {
{ "hashsizes", (PyCFunction)hash_get_hashsizes, METH_VARARGS, "" },
{ "set_use_bigcount", (PyCFunction)hash_set_use_bigcount, METH_VARARGS, "" },
{ "get_use_bigcount", (PyCFunction)hash_get_use_bigcount, METH_VARARGS, "" },
{ "init_threadsafe", (PyCFunction)hash_init_threadsafe, METH_VARARGS, "Initialize the table lock array" },
{ "n_lock_blocks", (PyCFunction)hash_n_lock_blocks, METH_VARARGS, "Number of blocks in the lock array" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be _ prefixed to indicate that it is a private/testing function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of lock blocks is passed when calling init_threadsafe. It has a default value that is hardcoded, but the user can override it; there isn't necessary anything private or testing-specific about it. I will make it const const though.

@mr-c
Copy link
Contributor

mr-c commented Mar 23, 2015

The current count() function https://github.com/ged-lab/khmer/blob/feature/thread_safety/lib/counting.hh#L175 looks to be half thread-safe.

Is there really that much overhead when single-threaded?

I'd rather see the extra thread safety added to count(). If there is significant slowdown when single threaded then we can put all the spinlocks (old and new) inside if statements.

@camillescott
Copy link
Member Author

@mr-c: I'm not sure on the overhead. Some recent reading I did suggests that CAS operations have significant overhead when called repeatedly (for example: http://joeduffyblog.com/2009/01/08/some-performance-implications-of-cas-operations/). However, I'll need to run some benchmarks to figure out the effects of this particular implementation.

@mr-c
Copy link
Contributor

mr-c commented Mar 23, 2015

(Also, thanks!)

…nge around variable names to be more sensible; change default beahvior for number of locks and minimum number of locks
@@ -94,6 +94,17 @@ public:
return _counts;
}

virtual void init_threadstuff(unsigned int n_table_locks=DEFAULT_TABLE_LOCKS) {
if(!_threadsafe) {
_n_table_locks = n_table_locks < 4 ? 4 : n_table_locks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why '4'?

@ctb
Copy link
Member

ctb commented Mar 27, 2015

+1 on @mr-c's comment regarding adding thread-safety to count instead of building a new function. Other than that, and a few questions, looks good and straightforward. The testing is OK although I doubt it's going to hit any problematic cases even w/o thread safety (might be interesting to test this by disabling thread safety and running that test a few hundred times... does it ever break?)

I'm assuming that the real purpose of this addition is not to use the Python interface, right? The Python interface is not going to give you a massive speed improvement here, I don't think!

@camillescott
Copy link
Member Author

(Putting some of our Friday discussion here just to have it written down somewhere)

The testing is OK although I doubt it's going to hit any problematic cases even w/o thread safety (might be interesting to test this by disabling thread safety and running that test a few hundred times... does it ever break?)

It seems that it takes longer for the Python interpreter to load in the next bit of bytecode and execute it than it does for the count_ts function to be called -- I couldn't get this to trip from Python land.

I'm assuming that the real purpose of this addition is not to use the Python interface, right? The Python interface is not going to give you a massive speed improvement here, I don't think!

Correct -- I was trying to figure out a way to test it with our current suite. At this point, it looks like more thorough testing will have to wait for the full multithreading PR.

@mr-c mr-c added this to the 1.4+ milestone May 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants