Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize scheduler queue data structures to reduce cache misses and fragmentation #129

Open
benvanik opened this issue May 19, 2020 · 2 comments

Comments

@benvanik
Copy link

From the TODO here:

// TODO: Implement a queue that recycles elements to reduce number of
// heap allocations.
using TaskQueue = std::deque<Task>;
using FiberQueue = std::deque<Fiber*>;
using FiberSet = std::unordered_set<Fiber*>;

and this:

std::vector<Allocator::unique_ptr<Fiber>>

unordered_map and deque aren't great for memory locality and heap fragmentation, and currently these are not using the marl Allocator so they are coming right from the system allocator. It'd be good to evaluate some alternative data structures (adding to or reusing those already in https://github.com/google/marl/blob/master/include/marl/containers.h) and pooling.

@ben-clayton
Copy link
Contributor

This could be broken down into two issues:

  1. We're performing 'system' heap allocations outside of the Allocator interface.
  2. unordered_map and deque aren't great for memory locality and heap fragmentation

I definitely agree with 1, and could be quickly fixed by implementing a std::allocator that wraps the marl::Allocator. I've created #131 to track this. I'd like to get to the point that we can run all the unit tests and examples without a single heap allocation that wasn't performed by the marl::Allocator.

As for 2, I don't disagree, but I've been using various profilers and benchmarks for guidance here. The slow replacement of std containers with those in marl/containers.h have all been done when they appeared hot in profiles. I'd suspect that you'll find other, more significant optimisation opportunities before you find these to be the bottleneck. If you can show they are a major performance sponge, I'm more than happy to try replacing them.

@benvanik
Copy link
Author

For 2 I was mainly just going off the TODO in the code and have no empirical results yet to show why anything different would be worth it besides the "doing zero work and not thrashing cache is always cheaper than doing any work" :)

Part of this would probably be adding more test cases to scheduler_bench.cpp so that we can stress those structures a bit more (in terms of having a large number of waiters, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants