-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to optimized lock #58
base: master
Are you sure you want to change the base?
Conversation
Could you explain why this is optimized? Your implementation is just a spinlock. I don't see any need to change this code path since it is rarely called (only once per thread). |
And as a general rule, I am not a fan of spinlocks and try to avoid them as much as possible. They should only be used in an environment where the current thread will not be preempted (i.e. in a kernel with interrupts disabled, or small embedded code). The problem is that if a thread get scheduled out while holding a spinlock, other threads can spend a long time needlessly spinning instead of making the OS switch back to the thread owning the lock. |
Make insertion fully cold
Sorry for the long delay, my thinking process was that since the codepath is so rarely called, it would make sense to optimize exactly for that (that there aren't multiple threads in this code path at once) to save a few cpu cycles (without impacting any fast path) since you are apparently very opposed to spin locks i would be okay with removing it but keeping the other changes since they eliminate calls to the get_tls_addr function in the fast path which can be quite expensive. I am doing this by tricking the compiler into addressing into the |
/// (to be precise it can lead to multiple TLS address lookups) | ||
#[derive(Clone, Copy)] | ||
struct ThreadWrapper { | ||
self_ptr: *const Thread, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of self_ptr
? It seems to always point to the thread
field. Is this just a bool
that indicates whether the thread has been initialized? How is this better than just using an Option
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what i was describing earlier, this is the hack that tricks the compiler into actually only ever generating one get_tls_addr
call and reusing the calculated address, if the compiler was smart enough, this self_ptr
wouldn't be necessary at all.
Based on #57
Please squash this, thanks!