-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally support parking_lot within global allocators. #305
Conversation
// For feature = "reserve", create the new table here, before we lock | ||
// all the buckets, in case the allocator needs to acquire a lock. | ||
#[cfg(feature = "reserve")] | ||
let new_table = HashTable::new(num_threads, table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right: if the allocator attempts to take a lock at this point, won't it just trigger hashtable growth again? This would result in infinite recursion instead of a deadlock, which isn't much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user calls reserve()
before all new threads, then grow_hashtable
will only ever run on threads which already have enough hashtable space for themselves, so the allocator can take locks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that there's an issue with calling reserve
before creating a thread: thread creation may fail, but there's no way to undo the reservation. Every time creating a thread fails, NUM_RESERVED
will still be increased.
If you instead call reserve
as the first action within a new thread then you run into the problem I described above where the allocator can cause infinite recursion.
I think the correct approach here is to do hashtable growth as the first thing in a thread, while using a thread-local bool
to detect recursion and skip the growth if it happens within the allocator. This is fine because the hash table here uses chaining so its capacity is not limited to the number of buckets.
This adds a feature to support using parking_lot locks in `#[global_allocator]` implementations. The main hazard is parking_lot calling into the global allocator, and then the global allocator calling back into parking_lot, at a point where parking_lot is not prepared to be re-entered on the same thread. There are currently two places where this comes up: - When a new thread is created, parking_lot needs to allocate space in its hashtable for the new thread. It uses the global allocator to allocate this space, and if the global allocator uses parking lot, it doesn't work because the hashtable is not ready for the new thread yet. - If the new hashtable is allocated while bucket locks are held, and the global allocator attempts to acquire a parking lot lock, it deadlocks because all the buckets are locked. This defines a new "global_allocator_compat" feature. When defined: The hashtable growing code checks whether it's being reentered, and does no growth in that case. And, `create_hashtable` will allocate the new `HashTable` before locking all the bucket locks, to avoid deadlock if allocating the `HashTable` attempts to take a lock. And, a new function, `init`, is defined, for allocating the initial hashtable. This must be done before performing any syncronization.
I've now implemented the thread-local I looked into statically initializing the hashtable, it's doable doable, though it looks awkward because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good enough as a hack for now, but I'd like to explore the possibility of a proper solution.
I'm somewhat concerned about what happens if an allocator ends up re-entering itself from unexpected places, which could cause a deadlock. Consider the case where an allocator needs to acquire 2 locks and triggers hashtable growth when trying to acquire the second one.
If global allocators have to explicitly support parking_lot, could we simply expose a critical section API which inhibits hashtable growth for the duration of a call?
For example:
fn alloc(...) {
parking_lot_core::inhibit_allocations(|| {
// actual allocator implementation
})
}
Interesting. I'll need to experiment with that. As another option, what would you think of exposing |
Another downside is that
I am certainly open to the idea! Perhaps it should be renamed to something else before being re-exported? After all the important part isn't that it fits in a word, it's that it doesn't require any dynamic allocations. |
cc @m-ou-se since solving this issue properly is an important step towards replacing the standard library locks with parking lot. |
One wrinkle with exporting And a concern with porting parking_lot's fairness code to |
I spent some time thinking about this. The only way I can see this working is for The actual allocator could be implemented in several ways:
Regarding |
// letting the outer call perform the grow. | ||
#[cfg(feature = "global_allocator_compat")] | ||
let grow = { | ||
thread_local!(static ACTIVE: Cell<bool> = Cell::new(false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unreliable, if the target platform uses pthread tls, there will be multiple memory operations inside std.
see https://github.com/rust-lang/rust/blob/master/library/std/src/thread/local.rs#L740 https://github.com/rust-lang/rust/blob/master/library/std/src/sys_common/thread_local_dtor.rs#L19
After rust-lang/rust#93740 and related developments, I myself no longer have a need for this PR. |
This adds a feature to support using parking_lot locks in
#[global_allocator]
implementations. The main hazard is parking_lot callinginto the global allocator, and then the global allocator calling back into
parking_lot, at a point where parking_lot is not prepared to be re-entered on
the same thread.
There are currently two places where this comes up:
When a new thread is created, parking_lot needs to allocate space in its
hashtable for the new thread. It uses the global allocator to allocate this
space, and if the global allocator uses parking lot, it doesn't work because
the hashtable is not ready for the new thread yet.
If the new hashtable is allocated while bucket locks are held, and the
global allocator attempts to acquire a parking lot lock, it deadlocks
because all the buckets are locked.
For the first, this adds a new
reserve
function, defined when the "reserve"feature is enabled. This allocates space in the hashtable, and if users call
it before each thread creation in the program, parking_lot won't ever need to
call
grow_hashtable
on new threads. This is admittedly awkward to guaranteein general, however it's not a problem in my own use case. And this function
might also be useful in cases where users want to create a pool of threads all
at once, to reduce temporary allocations, leakage, and rehashing. However, I'm
open to other ideas here.
For the second, when "reserve" is enabled,
create_hashtable
will allocate thenew
HashTable
before locking all the bucket locks, to avoid deadlock ifallocating the
HashTable
attempts to take a lock.