Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally support parking_lot within global allocators. #305

Closed
wants to merge 3 commits into from

Conversation

sunfishcode
Copy link

This adds a feature to support using parking_lot locks in
#[global_allocator] implementations. The main hazard is parking_lot calling
into the global allocator, and then the global allocator calling back into
parking_lot, at a point where parking_lot is not prepared to be re-entered on
the same thread.

There are currently two places where this comes up:

  • When a new thread is created, parking_lot needs to allocate space in its
    hashtable for the new thread. It uses the global allocator to allocate this
    space, and if the global allocator uses parking lot, it doesn't work because
    the hashtable is not ready for the new thread yet.

  • If the new hashtable is allocated while bucket locks are held, and the
    global allocator attempts to acquire a parking lot lock, it deadlocks
    because all the buckets are locked.

For the first, this adds a new reserve function, defined when the "reserve"
feature is enabled. This allocates space in the hashtable, and if users call
it before each thread creation in the program, parking_lot won't ever need to
call grow_hashtable on new threads. This is admittedly awkward to guarantee
in general, however it's not a problem in my own use case. And this function
might also be useful in cases where users want to create a pool of threads all
at once, to reduce temporary allocations, leakage, and rehashing. However, I'm
open to other ideas here.

For the second, when "reserve" is enabled, create_hashtable will allocate the
new HashTable before locking all the bucket locks, to avoid deadlock if
allocating the HashTable attempts to take a lock.

// For feature = "reserve", create the new table here, before we lock
// all the buckets, in case the allocator needs to acquire a lock.
#[cfg(feature = "reserve")]
let new_table = HashTable::new(num_threads, table);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right: if the allocator attempts to take a lock at this point, won't it just trigger hashtable growth again? This would result in infinite recursion instead of a deadlock, which isn't much better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user calls reserve() before all new threads, then grow_hashtable will only ever run on threads which already have enough hashtable space for themselves, so the allocator can take locks here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that there's an issue with calling reserve before creating a thread: thread creation may fail, but there's no way to undo the reservation. Every time creating a thread fails, NUM_RESERVED will still be increased.

If you instead call reserve as the first action within a new thread then you run into the problem I described above where the allocator can cause infinite recursion.

I think the correct approach here is to do hashtable growth as the first thing in a thread, while using a thread-local bool to detect recursion and skip the growth if it happens within the allocator. This is fine because the hash table here uses chaining so its capacity is not limited to the number of buckets.

This adds a feature to support using parking_lot locks in
`#[global_allocator]` implementations. The main hazard is parking_lot calling
into the global allocator, and then the global allocator calling back into
parking_lot, at a point where parking_lot is not prepared to be re-entered on
the same thread.

There are currently two places where this comes up:

 - When a new thread is created, parking_lot needs to allocate space in its
   hashtable for the new thread. It uses the global allocator to allocate this
   space, and if the global allocator uses parking lot, it doesn't work because
   the hashtable is not ready for the new thread yet.

 - If the new hashtable is allocated while bucket locks are held, and the
   global allocator attempts to acquire a parking lot lock, it deadlocks
   because all the buckets are locked.

This defines a new "global_allocator_compat" feature. When defined:

The hashtable growing code checks whether it's being reentered, and does no
growth in that case.

And, `create_hashtable` will allocate the new `HashTable` before locking all
the bucket locks, to avoid deadlock if allocating the `HashTable` attempts to
take a lock.

And, a new function, `init`, is defined, for allocating the initial
hashtable. This must be done before performing any syncronization.
@sunfishcode
Copy link
Author

I've now implemented the thread-local bool recursion check, as you suggested. The main tricky part is that it doesn't work for creating the initial hash table. In that case we can't defer allocating, because the HASHTABLE pointer starts out null. I've fixed that for now by adding an init function, which allows the hash table to be initialized before any other locks are taken.

I looked into statically initializing the hashtable, it's doable doable, though it looks awkward because Bucket contains a Timestamp which is not statically initializable. Once something like Lazy is stablized, this might be less awkward to implement.

Copy link
Owner

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough as a hack for now, but I'd like to explore the possibility of a proper solution.

I'm somewhat concerned about what happens if an allocator ends up re-entering itself from unexpected places, which could cause a deadlock. Consider the case where an allocator needs to acquire 2 locks and triggers hashtable growth when trying to acquire the second one.

If global allocators have to explicitly support parking_lot, could we simply expose a critical section API which inhibits hashtable growth for the duration of a call?

For example:

fn alloc(...) {
    parking_lot_core::inhibit_allocations(|| {
        // actual allocator implementation
    })
}

core/src/parking_lot.rs Outdated Show resolved Hide resolved
@sunfishcode
Copy link
Author

If global allocators have to explicitly support parking_lot, could we simply expose a critical section API which inhibits hashtable growth for the duration of a call?

For example:

fn alloc(...) {
    parking_lot_core::inhibit_allocations(|| {
        // actual allocator implementation
    })
}

Interesting. I'll need to experiment with that.

As another option, what would you think of exposing WordLock as a public API? I haven't studied the implementation here, but the webkit blog about it suggests its main downside is just the size of the lock type, and most allocators only use a small fixed number of locks. Then the solution could just be that allocators should use WordLock instead of RawMutex and friends.

@Amanieu
Copy link
Owner

Amanieu commented Dec 11, 2021

I haven't studied the implementation here, but the webkit blog about it suggests its main downside is just the size of the lock type, and most allocators only use a small fixed number of locks.

Another downside is that WordLock can't be used with a CondVar, but that's not going to be an issue in practice.

As another option, what would you think of exposing WordLock as a public API?

I am certainly open to the idea! Perhaps it should be renamed to something else before being re-exported? After all the important part isn't that it fits in a word, it's that it doesn't require any dynamic allocations.

@Amanieu
Copy link
Owner

Amanieu commented Dec 11, 2021

cc @m-ou-se since solving this issue properly is an important step towards replacing the standard library locks with parking lot.

@sunfishcode
Copy link
Author

One wrinkle with exporting WordLock is that it doesn't currently have any fairness mechanism. That may be ok for parking_lot's internal needs, but it's less ok for general-purpose use.

And a concern with porting parking_lot's fairness code to WorkLock is that the fairness code depends on std::time::Instant::now, which on some platforms needs to acquire a lock (to ensure monotonicity). If parking_lot is to replace the standard library locks, that would create another circular dependency.

@Amanieu
Copy link
Owner

Amanieu commented Jan 12, 2022

I spent some time thinking about this. The only way I can see this working is for parking_lot to use a custom allocator which is guaranteed to not call back int parking_lot. A function pointer to an allocator function could be provided via an API during early initialization.

The actual allocator could be implemented in several ways:

  • Without using any locks by directly calling mmap.
  • Using only libc locks such as pthread_mutex_t.
  • Using only WordLock.

Regarding std::time::Instant::now, the lock is going to be removed soon so we don't need to worry about it.

// letting the outer call perform the grow.
#[cfg(feature = "global_allocator_compat")]
let grow = {
thread_local!(static ACTIVE: Cell<bool> = Cell::new(false));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunfishcode
Copy link
Author

After rust-lang/rust#93740 and related developments, I myself no longer have a need for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants