Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite breakage: Tests keep too many file descriptors open, breaks with concurrency #7772

Closed
lilyball opened this issue Jul 13, 2013 · 18 comments

Comments

@lilyball
Copy link
Contributor

When running the test suite (make check-stage2-std is sufficient to reproduce), in the middle of the tests the program aborts, often with a cute abort message. There's a few different kinds of aborts, such as fatal runtime error: assertion failed: void_sched.is_not_null() or error opening /dev/urandom: Too many open files, but they all seem to be caused by running out of file descriptors. The default limit on my machine (OS X) is 256, and if I catch the abort (with lldb) I can see that they're all in use.

This problem seems to have been triggered by 8afec77, which was introduced into master by PR #7265. This commit changes the default number of concurrent test threads from 4 to rust_get_num_cpus() * 2. Experimentally, anything above 6 causes the test failure, and my machine reports 8 CPUs so the test suite is attempting to use 16 threads.

I don't know what the root cause here is; either we're keeping fds open much longer than we should, or we have a bunch of tests that require a lot of fds, or maybe something completely different. Interestingly, lsof reports that most of the fds in use are PIPEs. What do we use pipes for?

@brson
Copy link
Contributor

brson commented Jul 17, 2013

That commit is not right anyway. I'm going to revert it.

@brson
Copy link
Contributor

brson commented Jul 17, 2013

Oh, I read it wrong. That commit is working as intended.

@brson
Copy link
Contributor

brson commented Jul 17, 2013

our new scheduler is based on libuv, which is event driven by waiting on fds, and the test suite now aggressively tests the new scheduler. it may be that we are using too many uv handles, and in turn fds, in some situations - the scheduler definitely allocates too many idle handles, but I'm not sure that those require an fd.

we did have to raise the ulimit on the os x bots to land it.

@brson
Copy link
Contributor

brson commented Jul 17, 2013

There are two overcommits involved in the test suite now that could be contributing to this problem: first, as you pointed out, the multithreaded scheduler tests create num_cpus * 2 scheduler threads each; second, the normal test runner creates num_cpus * 4 test tasks.

Each scheduler I believe needs a minimum of 2 fds, one for the kqueue (I think), and a second for an async handle. So at any time while running stdtest you may need at least num_cpus * 2 * 4 * 2 = 64 fds, and undoubtedly there are even more that I'm not aware of.

@pnkfelix
Copy link
Member

What is the appropriate way to raise the fd ulimit on OS X? I tried doing this when I was looking into #7797 but whatever technique I used did (from stackoverflow) not work

@brson
Copy link
Contributor

brson commented Jul 17, 2013

Math was wrong. It's num_cpus * 2 * num_cpus * 4 * 2 = 256.

@brson
Copy link
Contributor

brson commented Jul 17, 2013

I'm going to change the amount of overcommit here to try to get the fds back down.

@brson
Copy link
Contributor

brson commented Jul 17, 2013

@pnkfelix here's what I did on the bots:

  $ echo -e 'limit maxfiles 8192 20480\nlimit maxproc 1000 2000' | sudo tee -a /etc/launchd.conf
  $ echo 'ulimit -n 4096' | sudo tee -a /etc/profile

Reboot afterwards.

brson added a commit to brson/rust that referenced this issue Jul 17, 2013
Uses more fds than are available by default on OS X.
@lilyball
Copy link
Contributor Author

The default kern.maxopenfiles and kern.maxfilesperproc is 10240. You can use the getrlimit() and setrlimit() calls to raise the default fd ulimit from 256 up to this 10240.

Curiously, in my test of this, getrlimit() returned ~(1<<63) as the max instead of 10240, but wouldn't let me set the soft limit higher than 10240.

One potential workaround rust could do is, during startup, use sysctl to read the kern.maxfilesperproc setting and setrlimit() to raise the ulimit to that max.

On another note, does every rust task need a libuv event loop? The requirement of multiple fds per event loop means that tasks are even less lightweight than I had believed (significantly less-so than in Go, where I can create goroutines up the wazoo).

@metajack
Copy link
Contributor

@kballard Every scheduler has a libuv event loop, not every task. There's one scheduler per thread, and all tasks are multiplexed across the scheduler threads.

brson added a commit to brson/rust that referenced this issue Jul 19, 2013
…-lang#7772

Too much overcommit here exhausts the low fd limit on OS X.
@lilyball
Copy link
Contributor Author

lilyball commented Aug 2, 2013

c4ff250fd raises the fd limit on OS X. With this commit, 49b72bd can be reverted.

@lilyball
Copy link
Contributor Author

lilyball commented Aug 2, 2013

@brson: Thoughts on the above comment? I can submit this as a PR if you think it makes sense.

@bors bors closed this as completed in 2001cc0 Aug 3, 2013
stepancheg referenced this issue in brson/rust Aug 4, 2013
This workaround was less than ideal. A better solution is to raise the
fd limit.

This reverts commit 49b72bd.
@stepancheg
Copy link
Contributor

Although issue is marked closed, I still have the same problem on Linux with num_cpus() == 24. If I did git bisect run make check correctly, problem arose in 70d2be0.

ulimit -n is 1024 (I cannot raise it to check if it is race or limit overrun, because I have no root access on that server, with RUST_RT_TEST_THREADS=4 it works well)

@lilyball
Copy link
Contributor Author

lilyball commented Aug 4, 2013

@stepancheg: I just checked on my Ubuntu 12.04 machine, ulimit -n is indeed 1024 but I can raise it to, say, 2048, without root access. I'm not particularly familiar with linux, so I don't know how to find the equivalent of sysctl kern.maxfilesperproc (which doesn't exist on Linux), but if someone does know maybe we could do the equivalent of raise_fd_limit() for Linux.

Alternatively, we could put an upper bound on the number of threads used, instead of just always picking num_cpus * 2.

@stepancheg
Copy link
Contributor

@kballard You can increase soft limit to hard limit.

% ulimit -n 100
% ulimit -n 1024
% ulimit -n 1025
ulimit: value exceeds hard limit
zsh: exit 1
% ulimit -H -n 1025
ulimit: can't raise hard limits
zsh: exit 1

On my host hard limit is also 1024.

I ended up setting RUST_RT_TEST_THREADS.

@lilyball
Copy link
Contributor Author

lilyball commented Aug 4, 2013

@stepancheg Looks like on my machine the hard limit defaults to 4096.

Any idea what the upper bound on RUST_RT_TEST_THREADS is that works reliably with 1024?

@lilyball
Copy link
Contributor Author

lilyball commented Aug 4, 2013

Actually, I found that 6 threads was the highest I could go with the 256 limit, so 24 is probably the cap for 1024.

@pnkfelix
Copy link
Member

So I know that we have documented how to change the settings on a given Mac OS X machine to up the limits. But shouldn't it still be a bug that we cannot run our test driver on a vanilla system? (Maybe not a high priority bug, but still a bug nonetheless? Should I open a separate issue for this?)

flip1995 pushed a commit to flip1995/rust that referenced this issue Oct 7, 2021
…teffen

Handle intra-doc links in doc_markdown

Fixes rust-lang#7758

changelog: Handle intra-doc links in [`doc_markdown`]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants