-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-safety for hashtable #876
base: master
Are you sure you want to change the base?
Conversation
|
Jenkins, test this please, or we'll be having another stern talk. |
You ready to test this please, Jenkins? |
@mr-c @ctb @luizirber thoughts? is the testing robust enough? does a separate function ( note: i've also tested this by monkeypatching |
Have you been tippin' the bottle again? Jenkins, test this please! |
@@ -1541,10 +1618,14 @@ static PyMethodDef khmer_counting_methods[] = { | |||
{ "hashsizes", (PyCFunction)hash_get_hashsizes, METH_VARARGS, "" }, | |||
{ "set_use_bigcount", (PyCFunction)hash_set_use_bigcount, METH_VARARGS, "" }, | |||
{ "get_use_bigcount", (PyCFunction)hash_get_use_bigcount, METH_VARARGS, "" }, | |||
{ "init_threadsafe", (PyCFunction)hash_init_threadsafe, METH_VARARGS, "Initialize the table lock array" }, | |||
{ "n_lock_blocks", (PyCFunction)hash_n_lock_blocks, METH_VARARGS, "Number of blocks in the lock array" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be _
prefixed to indicate that it is a private/testing function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of lock blocks is passed when calling init_threadsafe
. It has a default value that is hardcoded, but the user can override it; there isn't necessary anything private or testing-specific about it. I will make it const const though.
The current Is there really that much overhead when single-threaded? I'd rather see the extra thread safety added to |
@mr-c: I'm not sure on the overhead. Some recent reading I did suggests that CAS operations have significant overhead when called repeatedly (for example: http://joeduffyblog.com/2009/01/08/some-performance-implications-of-cas-operations/). However, I'll need to run some benchmarks to figure out the effects of this particular implementation. |
(Also, thanks!) |
…nge around variable names to be more sensible; change default beahvior for number of locks and minimum number of locks
@@ -94,6 +94,17 @@ public: | |||
return _counts; | |||
} | |||
|
|||
virtual void init_threadstuff(unsigned int n_table_locks=DEFAULT_TABLE_LOCKS) { | |||
if(!_threadsafe) { | |||
_n_table_locks = n_table_locks < 4 ? 4 : n_table_locks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why '4'?
+1 on @mr-c's comment regarding adding thread-safety to count instead of building a new function. Other than that, and a few questions, looks good and straightforward. The testing is OK although I doubt it's going to hit any problematic cases even w/o thread safety (might be interesting to test this by disabling thread safety and running that test a few hundred times... does it ever break?) I'm assuming that the real purpose of this addition is not to use the Python interface, right? The Python interface is not going to give you a massive speed improvement here, I don't think! |
(Putting some of our Friday discussion here just to have it written down somewhere)
It seems that it takes longer for the Python interpreter to load in the next bit of bytecode and execute it than it does for the
Correct -- I was trying to figure out a way to test it with our current suite. At this point, it looks like more thorough testing will have to wait for the full multithreading PR. |
The multithreading PR has gotten unwieldy, so this is part of the process to break it down into more manageable chunks. This set of commits adds an array of locks to Hashtable with are indexed by the hashvalue, allowing a thread to lock only a small block when writing. It also introduces a new function meant specifically for thread-safe counting, to avoid overhead under single-threaded cases and maintain backwards compatibility.