Test suite breakage: Tests keep too many file descriptors open, breaks with concurrency #7772

lilyball · 2013-07-13T09:27:19Z

When running the test suite (make check-stage2-std is sufficient to reproduce), in the middle of the tests the program aborts, often with a cute abort message. There's a few different kinds of aborts, such as fatal runtime error: assertion failed: void_sched.is_not_null() or error opening /dev/urandom: Too many open files, but they all seem to be caused by running out of file descriptors. The default limit on my machine (OS X) is 256, and if I catch the abort (with lldb) I can see that they're all in use.

This problem seems to have been triggered by 8afec77, which was introduced into master by PR #7265. This commit changes the default number of concurrent test threads from 4 to rust_get_num_cpus() * 2. Experimentally, anything above 6 causes the test failure, and my machine reports 8 CPUs so the test suite is attempting to use 16 threads.

I don't know what the root cause here is; either we're keeping fds open much longer than we should, or we have a bunch of tests that require a lot of fds, or maybe something completely different. Interestingly, lsof reports that most of the fds in use are PIPEs. What do we use pipes for?

The text was updated successfully, but these errors were encountered:

brson · 2013-07-17T17:18:24Z

That commit is not right anyway. I'm going to revert it.

brson · 2013-07-17T17:19:22Z

Oh, I read it wrong. That commit is working as intended.

brson · 2013-07-17T17:26:24Z

our new scheduler is based on libuv, which is event driven by waiting on fds, and the test suite now aggressively tests the new scheduler. it may be that we are using too many uv handles, and in turn fds, in some situations - the scheduler definitely allocates too many idle handles, but I'm not sure that those require an fd.

we did have to raise the ulimit on the os x bots to land it.

brson · 2013-07-17T17:35:29Z

There are two overcommits involved in the test suite now that could be contributing to this problem: first, as you pointed out, the multithreaded scheduler tests create num_cpus * 2 scheduler threads each; second, the normal test runner creates num_cpus * 4 test tasks.

Each scheduler I believe needs a minimum of 2 fds, one for the kqueue (I think), and a second for an async handle. So at any time while running stdtest you may need at least num_cpus * 2 * 4 * 2 = 64 fds, and undoubtedly there are even more that I'm not aware of.

pnkfelix · 2013-07-17T17:39:06Z

What is the appropriate way to raise the fd ulimit on OS X? I tried doing this when I was looking into #7797 but whatever technique I used did (from stackoverflow) not work

brson · 2013-07-17T17:39:13Z

Math was wrong. It's num_cpus * 2 * num_cpus * 4 * 2 = 256.

brson · 2013-07-17T17:39:53Z

I'm going to change the amount of overcommit here to try to get the fds back down.

brson · 2013-07-17T17:40:31Z

@pnkfelix here's what I did on the bots:

  $ echo -e 'limit maxfiles 8192 20480\nlimit maxproc 1000 2000' | sudo tee -a /etc/launchd.conf
  $ echo 'ulimit -n 4096' | sudo tee -a /etc/profile

Reboot afterwards.

Uses more fds than are available by default on OS X.

lilyball · 2013-07-17T17:49:57Z

The default kern.maxopenfiles and kern.maxfilesperproc is 10240. You can use the getrlimit() and setrlimit() calls to raise the default fd ulimit from 256 up to this 10240.

Curiously, in my test of this, getrlimit() returned ~(1<<63) as the max instead of 10240, but wouldn't let me set the soft limit higher than 10240.

One potential workaround rust could do is, during startup, use sysctl to read the kern.maxfilesperproc setting and setrlimit() to raise the ulimit to that max.

On another note, does every rust task need a libuv event loop? The requirement of multiple fds per event loop means that tasks are even less lightweight than I had believed (significantly less-so than in Go, where I can create goroutines up the wazoo).

metajack · 2013-07-17T18:19:15Z

@kballard Every scheduler has a libuv event loop, not every task. There's one scheduler per thread, and all tasks are multiplexed across the scheduler threads.

…-lang#7772 Too much overcommit here exhausts the low fd limit on OS X.

lilyball · 2013-08-02T00:53:09Z

c4ff250fd raises the fd limit on OS X. With this commit, 49b72bd can be reverted.

lilyball · 2013-08-02T00:54:28Z

@brson: Thoughts on the above comment? I can submit this as a PR if you think it makes sense.

This workaround was less than ideal. A better solution is to raise the fd limit. This reverts commit 49b72bd.

stepancheg · 2013-08-04T18:24:43Z

Although issue is marked closed, I still have the same problem on Linux with num_cpus() == 24. If I did git bisect run make check correctly, problem arose in 70d2be0.

ulimit -n is 1024 (I cannot raise it to check if it is race or limit overrun, because I have no root access on that server, with RUST_RT_TEST_THREADS=4 it works well)

lilyball · 2013-08-04T19:44:56Z

@stepancheg: I just checked on my Ubuntu 12.04 machine, ulimit -n is indeed 1024 but I can raise it to, say, 2048, without root access. I'm not particularly familiar with linux, so I don't know how to find the equivalent of sysctl kern.maxfilesperproc (which doesn't exist on Linux), but if someone does know maybe we could do the equivalent of raise_fd_limit() for Linux.

Alternatively, we could put an upper bound on the number of threads used, instead of just always picking num_cpus * 2.

stepancheg · 2013-08-04T19:55:11Z

@kballard You can increase soft limit to hard limit.

% ulimit -n 100
% ulimit -n 1024
% ulimit -n 1025
ulimit: value exceeds hard limit
zsh: exit 1
% ulimit -H -n 1025
ulimit: can't raise hard limits
zsh: exit 1

On my host hard limit is also 1024.

I ended up setting RUST_RT_TEST_THREADS.

lilyball · 2013-08-04T19:56:57Z

@stepancheg Looks like on my machine the hard limit defaults to 4096.

Any idea what the upper bound on RUST_RT_TEST_THREADS is that works reliably with 1024?

lilyball · 2013-08-04T20:30:20Z

Actually, I found that 6 threads was the highest I could go with the 256 limit, so 24 is probably the cap for 1024.

pnkfelix · 2013-08-30T15:44:17Z

So I know that we have documented how to change the settings on a given Mac OS X machine to up the limits. But shouldn't it still be a bug that we cannot run our test driver on a vanilla system? (Maybe not a high priority bug, but still a bug nonetheless? Should I open a separate issue for this?)

…teffen Handle intra-doc links in doc_markdown Fixes rust-lang#7758 changelog: Handle intra-doc links in [`doc_markdown`]

sfackler mentioned this issue Jul 14, 2013

SIGABRT running rt::comm tests #7709

Closed

brson added a commit to brson/rust that referenced this issue Jul 17, 2013

std::rt: Don't overcommit rt test threads. rust-lang#7772

69d4e31

Uses more fds than are available by default on OS X.

brson mentioned this issue Jul 17, 2013

random fails/segfaults from the io-upstream merge when RUST_THREADS=2 on Mac #7797

Closed

brson added a commit to brson/rust that referenced this issue Jul 19, 2013

std::rt: Use a constant 4 threads for multithreaded sched tests. rust…

49b72bd

…-lang#7772 Too much overcommit here exhausts the low fd limit on OS X.

bors closed this as completed in 2001cc0 Aug 3, 2013

stepancheg referenced this issue in brson/rust Aug 4, 2013

Revert "std::rt: Use a constant 4 threads for multithreaded sched tests"

70d2be0

This workaround was less than ideal. A better solution is to raise the fd limit. This reverts commit 49b72bd.

pnkfelix mentioned this issue Aug 31, 2013

Trouble with pipe on OS X #8904

Closed

c0bw3b mentioned this issue Nov 16, 2019

procs: 0.8.11 -> 0.8.13 NixOS/nixpkgs#72348

Merged

10 tasks

flip1995 pushed a commit to flip1995/rust that referenced this issue Oct 7, 2021

Auto merge of rust-lang#7772 - Manishearth:doc-markdown-intra, r=cams…

11492c7

…teffen Handle intra-doc links in doc_markdown Fixes rust-lang#7758 changelog: Handle intra-doc links in [`doc_markdown`]

Test suite breakage: Tests keep too many file descriptors open, breaks with concurrency #7772

Test suite breakage: Tests keep too many file descriptors open, breaks with concurrency #7772

Comments

lilyball commented Jul 13, 2013

brson commented Jul 17, 2013

Uh oh!

brson commented Jul 17, 2013

Uh oh!

brson commented Jul 17, 2013

Uh oh!

brson commented Jul 17, 2013

Uh oh!

pnkfelix commented Jul 17, 2013

Uh oh!

brson commented Jul 17, 2013

Uh oh!

brson commented Jul 17, 2013

Uh oh!

brson commented Jul 17, 2013

Uh oh!

lilyball commented Jul 17, 2013

Uh oh!

metajack commented Jul 17, 2013

Uh oh!

lilyball commented Aug 2, 2013

Uh oh!

lilyball commented Aug 2, 2013

Uh oh!

stepancheg commented Aug 4, 2013

Uh oh!

lilyball commented Aug 4, 2013

Uh oh!

stepancheg commented Aug 4, 2013

Uh oh!

lilyball commented Aug 4, 2013

Uh oh!

lilyball commented Aug 4, 2013

Uh oh!

pnkfelix commented Aug 30, 2013

Uh oh!