-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite breakage: Tests keep too many file descriptors open, breaks with concurrency #7772
Comments
That commit is not right anyway. I'm going to revert it. |
Oh, I read it wrong. That commit is working as intended. |
our new scheduler is based on libuv, which is event driven by waiting on fds, and the test suite now aggressively tests the new scheduler. it may be that we are using too many uv handles, and in turn fds, in some situations - the scheduler definitely allocates too many idle handles, but I'm not sure that those require an fd. we did have to raise the ulimit on the os x bots to land it. |
There are two overcommits involved in the test suite now that could be contributing to this problem: first, as you pointed out, the multithreaded scheduler tests create num_cpus * 2 scheduler threads each; second, the normal test runner creates num_cpus * 4 test tasks. Each scheduler I believe needs a minimum of 2 fds, one for the kqueue (I think), and a second for an async handle. So at any time while running stdtest you may need at least num_cpus * 2 * 4 * 2 = 64 fds, and undoubtedly there are even more that I'm not aware of. |
What is the appropriate way to raise the fd ulimit on OS X? I tried doing this when I was looking into #7797 but whatever technique I used did (from stackoverflow) not work |
Math was wrong. It's num_cpus * 2 * num_cpus * 4 * 2 = 256. |
I'm going to change the amount of overcommit here to try to get the fds back down. |
@pnkfelix here's what I did on the bots:
|
Uses more fds than are available by default on OS X.
The default kern.maxopenfiles and kern.maxfilesperproc is 10240. You can use the getrlimit() and setrlimit() calls to raise the default fd ulimit from 256 up to this 10240. Curiously, in my test of this, getrlimit() returned ~(1<<63) as the max instead of 10240, but wouldn't let me set the soft limit higher than 10240. One potential workaround rust could do is, during startup, use sysctl to read the kern.maxfilesperproc setting and setrlimit() to raise the ulimit to that max. On another note, does every rust task need a libuv event loop? The requirement of multiple fds per event loop means that tasks are even less lightweight than I had believed (significantly less-so than in Go, where I can create goroutines up the wazoo). |
@kballard Every scheduler has a libuv event loop, not every task. There's one scheduler per thread, and all tasks are multiplexed across the scheduler threads. |
…-lang#7772 Too much overcommit here exhausts the low fd limit on OS X.
c4ff250fd raises the fd limit on OS X. With this commit, 49b72bd can be reverted. |
@brson: Thoughts on the above comment? I can submit this as a PR if you think it makes sense. |
This workaround was less than ideal. A better solution is to raise the fd limit. This reverts commit 49b72bd.
Although issue is marked closed, I still have the same problem on Linux with
|
@stepancheg: I just checked on my Ubuntu 12.04 machine, Alternatively, we could put an upper bound on the number of threads used, instead of just always picking |
@kballard You can increase soft limit to hard limit.
On my host hard limit is also 1024. I ended up setting |
@stepancheg Looks like on my machine the hard limit defaults to 4096. Any idea what the upper bound on |
Actually, I found that |
So I know that we have documented how to change the settings on a given Mac OS X machine to up the limits. But shouldn't it still be a bug that we cannot run our test driver on a vanilla system? (Maybe not a high priority bug, but still a bug nonetheless? Should I open a separate issue for this?) |
…teffen Handle intra-doc links in doc_markdown Fixes rust-lang#7758 changelog: Handle intra-doc links in [`doc_markdown`]
When running the test suite (
make check-stage2-std
is sufficient to reproduce), in the middle of the tests the program aborts, often with a cute abort message. There's a few different kinds of aborts, such asfatal runtime error: assertion failed: void_sched.is_not_null()
orerror opening /dev/urandom: Too many open files
, but they all seem to be caused by running out of file descriptors. The default limit on my machine (OS X) is 256, and if I catch the abort (withlldb
) I can see that they're all in use.This problem seems to have been triggered by 8afec77, which was introduced into master by PR #7265. This commit changes the default number of concurrent test threads from
4
torust_get_num_cpus() * 2
. Experimentally, anything above6
causes the test failure, and my machine reports 8 CPUs so the test suite is attempting to use 16 threads.I don't know what the root cause here is; either we're keeping fds open much longer than we should, or we have a bunch of tests that require a lot of fds, or maybe something completely different. Interestingly,
lsof
reports that most of the fds in use are PIPEs. What do we use pipes for?The text was updated successfully, but these errors were encountered: