-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: Suggest limiting core count on very-multicore machines to avoid kernel bug? #91
Comments
4.14.81 and 4.19.2 contain a lot of btrfs fixes which might help...but more likely won't. I'm OK with a doc change. |
Does it make sense to have so much IO threads running? Should there be an upper limit? I wonder if there's any sense in having more than But probably still makes sense for the number crunching threads like hashing... |
The IO threads are mixed in with the hashing threads, so it's a bit of a mess at the moment. To really get IO and hashing usefully separated we'd need to rewrite most of the code, implement multiple distinct thread pools and a scheduler, and we'd need one scheduler for spinning disks and a different one for SSDs. And then it would all go to hell if we ever found a match for anything (suddenly we need locks on multiple filesystem objects and disks, probably just end up effectively single-threaded across the entire filesystem). Experimentally I've found bees goes a little faster if the worker thread count is higher than the disk count, but no faster (maybe even a little slower) if the worker thread count is higher than the CPU core count (at least for the first 8 cores). I've also found that things like limiting the number of threads executing dedupe or I can put in a soft limit, so |
So could a number like |
I wouldn't try to guess without running a lot of performance experiments on specific hardware configurations. Even if we did that, changing the bees code could instantly invalidate all that data. There are huge gains still possible from relatively small code changes, and I have big code changes planned too. The number of workers is configurable, and the default (after adding a soft limit for people with huge multi-socket systems) works OK. Users who know better can change it or test assorted values. |
#91 describes problems encountered when running bees on systems with many CPU cores. Limit the computed number of threads (using --thread-factor or the default) to a maximum of 8 (i.e. the number of logical cores in a modern laptop). Users can override the limit by using --thread-count. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We have a core-count limit (the second option in the original issue). Can we close this? |
I certainly consider it fully addressed. |
Zygo#91 describes problems encountered when running bees on systems with many CPU cores. Limit the computed number of threads (using --thread-factor or the default) to a maximum of 8 (i.e. the number of logical cores in a modern laptop). Users can override the limit by using --thread-count. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
On a 2-socket, 48-core system I've consistently had my I/O subsystem irrecoverably hang in less than 24 hours of operation; reproduced both with kernel 4.14.78 and 4.18.16. Using
--thread-count 4
makes this go away.Perhaps we should either:
--thread-count
on very-multicore hardware in the "kernel bugs" wiki page--thread-factor
The text was updated successfully, but these errors were encountered: