Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread::Builder::spawn returns WouldBlock for EAGAIN #46345

Open
jethrogb opened this issue Nov 28, 2017 · 11 comments
Open

thread::Builder::spawn returns WouldBlock for EAGAIN #46345

jethrogb opened this issue Nov 28, 2017 · 11 comments
Labels
C-bug Category: This is a bug. O-linux Operating system: Linux P-low Low priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@jethrogb
Copy link
Contributor

jethrogb commented Nov 28, 2017

When trying to launch a thread and the thread limit is reached or there is not enough virtual address space available for another thread, thread::Builder::spawn returns an io::Error of kind WouldBlock.

extern crate libc;

fn main() {
    unsafe {
        libc::setrlimit(libc::RLIMIT_NPROC, &libc::rlimit { rlim_cur: 0, rlim_max: 0 });
    }

    let error = std::thread::Builder::new().spawn(|| unreachable!()).unwrap_err();

    println!("I/O error kind {:?}: {:?}", error.kind(), error);
}

This prints (on Linux):

I/O error kind WouldBlock: Error { repr: Os { code: 11, message: "Resource temporarily unavailable" } }

WouldBlock means:

The operation needs to block to complete, but the blocking operation was requested to not occur.

This doesn't make a lot of sense in the context of thread creation. Yes, if the create call were to block until the thread/virtual address space limit is no longer reached, this error interpretation would be correct, but I know of no threading API (Windows or Linux) with these semantics.

The source of the problem is that the POSIX errors EAGAIN and EWOULDBLOCK may be defined as the same error value, and Rust chose to always interpret that as EWOULDBLOCK. I'm not sure what course of action I'd suggest to clear up the confusion.

(NB. On Windows, AFAICT there is no way to limit the number of threads, but when running out of virtual address space, CreateThread returns ERROR_NOT_ENOUGH_MEMORY, which gets decoded as kind Other)

@pietroalbini pietroalbini added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 23, 2018
Centril added a commit to Centril/rust that referenced this issue Jan 28, 2019
…alexcrichton

Print a slightly clearer message when failing to launch a thread

As discussed in rust-lang#46345, the `io::Error` you get when a thread fails to launch is of type `io::ErrorKind::WouldBlock`. This is super uninformative when an arbitrary `thread::spawn` fails somewhere in your code:

```
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 11,
kind: WouldBlock, message: "operation would block" }', src/libcore/result.rs:997:5
```

This PR improves the situation a little bit by using `expect` instead of `unwrap`. I don't consider this a complete fix for rust-lang#46345 though.
Centril added a commit to Centril/rust that referenced this issue Jan 28, 2019
…alexcrichton

Print a slightly clearer message when failing to launch a thread

As discussed in rust-lang#46345, the `io::Error` you get when a thread fails to launch is of type `io::ErrorKind::WouldBlock`. This is super uninformative when an arbitrary `thread::spawn` fails somewhere in your code:

```
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 11,
kind: WouldBlock, message: "operation would block" }', src/libcore/result.rs:997:5
```

This PR improves the situation a little bit by using `expect` instead of `unwrap`. I don't consider this a complete fix for rust-lang#46345 though.
@rocallahan
Copy link

In rr we automatically retry Linux clone() syscalls when we see EAGAIN. If we don't do that, tests fail under load; when we do do it, those tests pass. See rr-debugger/rr@68bd393
So I think it's a good idea to automatically retry clone() on EAGAIN.

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. O-linux Operating system: Linux and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Feb 13, 2020
@nagisa
Copy link
Member

nagisa commented Feb 14, 2020

So I think it's a good idea to automatically retry clone() on EAGAIN.

The only thing we currently do this for widely is EINTR AFAIK as implications of doing so are well understood. I’m not sure this is true for clone's EAGAIN.

@sfackler
Copy link
Member

If the process has reached its limit on how many threads it is allowed to have, it does not seem wise to just hot-spin trying to make a new one forever.

@rgrig
Copy link

rgrig commented Feb 14, 2020

I observed this error when I was much below the limit of threads, but I was creating them very quickly.

fn go() {
	std::thread::sleep(std::time::Duration::from_millis(10000));
}

fn main() {
    let mut cnt = 0;
    loop {
        match std::thread::Builder::new().spawn(go) {
            Ok(_) => cnt += 1,
            Err(e) => {
                println!("error: {:?} {:?}", e.kind(), e);
                println!("cnt {}", cnt);
                return
            }
        }
    }
}

results in

rg@rg-2018:temp$ ./a 
error: WouldBlock Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
cnt 9919
rg@rg-2018:temp$ cat /proc/sys/kernel/threads-max
126588

@rocallahan
Copy link

If the process has reached its limit on how many threads it is allowed to have, it does not seem wise to just hot-spin trying to make a new one forever.

True. Backing off for a second would probably be fine.

@alecmocatta
Copy link
Contributor

I've only tried under a VM but fwiw I'm not able to reproduce thread::spawn/clone transiently failing on Linux.

This on my system reaches the thread limit reliably without transient failures. Transient failures would cause wouldblock and progress to be repeatedly printed, but I have not seen that.

use std::{
    io::ErrorKind,
    mem::forget,
    thread::{sleep, Builder},
    time::Duration,
};

fn idle() {
    loop {
        sleep(Duration::from_secs(60));
    }
}

fn main() {
    let mut threads = 0;
    let mut stuck = false;
    loop {
        match Builder::new().stack_size(4096).spawn(idle) {
            Ok(handle) => {
                threads += 1;
                if stuck {
                    println!("progress after {}", threads);
                }
                stuck = false;
                forget(handle);
            }
            Err(ref err) if err.kind() == ErrorKind::WouldBlock => {
                if !stuck {
                    println!("wouldblock after {}", threads);
                }
                stuck = true;
                sleep(Duration::from_millis(100));
            }
            Err(e) => {
                panic!("{}: {:?}", threads, e.kind());
            }
        }
    }
}

It's possible that thread limits are lower than people are expecting. In particular, /proc/sys/kernel/threads-max is likely not the maximum if systemd is installed (in which case see systemctl status user-$UID.slice).

@ghost
Copy link

ghost commented Aug 2, 2020

My GitHub Actions workflow often fails because libtest failed to spawn new threads.
I believe the failures is unrelated to the code I tested. I have to use cargo test -- --test-threads 1 now. 😭

Log
2020-08-02T05:58:52.1016977Z running 14 tests
2020-08-02T05:58:52.1017997Z test freestanding::tests::assume ... ok
2020-08-02T05:58:52.1019489Z test freestanding::tests::compressing_stringify ... ok
2020-08-02T05:58:52.1021633Z test freestanding::tests::compressing_include_str ... ok
2020-08-02T05:58:52.1022991Z test freestanding::tests::global_ctor ... ok
2020-08-02T05:58:52.1023776Z test freestanding::tests::const_default ... ok
2020-08-02T05:58:52.1026345Z test freestanding::tests::result_swap ... ok
2020-08-02T05:58:52.1027920Z test freestanding::tests::guard ... ok
2020-08-02T05:58:52.1029857Z test freestanding::tests::utf16 ... ok
2020-08-02T05:58:52.1031776Z test freestanding::tests::scope_exit ... ok
2020-08-02T05:58:52.1034426Z test full_featured_os::argh::tests::from_custom_env_as_result ... ok
2020-08-02T05:58:52.1058549Z thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/libtest/lib.rs:473:32
2020-08-02T05:58:52.1059340Z stack backtrace:
2020-08-02T05:58:52.1259107Z    0:     0x55ee97a8b2b5 - backtrace::backtrace::libunwind::trace::h75aedf5f78e5147f
2020-08-02T05:58:52.1260563Z                                at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
2020-08-02T05:58:52.1261725Z    1:     0x55ee97a8b2b5 - backtrace::backtrace::trace_unsynchronized::h18fb73c9ac9ae753
2020-08-02T05:58:52.1262530Z                                at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2020-08-02T05:58:52.1263603Z    2:     0x55ee97a8b2b5 - std::sys_common::backtrace::_print_fmt::h65f97470ff13ec84
2020-08-02T05:58:52.1264093Z                                at src/libstd/sys_common/backtrace.rs:78
2020-08-02T05:58:52.1265013Z    3:     0x55ee97a8b2b5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hee061c54ddc9f024
2020-08-02T05:58:52.1265475Z                                at src/libstd/sys_common/backtrace.rs:59
2020-08-02T05:58:52.1368834Z    4:     0x55ee97ab2e2c - core::fmt::write::hfbd2baad61ed21a8
2020-08-02T05:58:52.1369441Z                                at src/libcore/fmt/mod.rs:1117
2020-08-02T05:58:52.1370425Z    5:     0x55ee97a87b92 - std::io::Write::write_fmt::h72f9bd227f40dc62
2020-08-02T05:58:52.1371096Z                                at src/libstd/io/mod.rs:1508
2020-08-02T05:58:52.1372016Z    6:     0x55ee97a8da10 - std::sys_common::backtrace::_print::h2d2cd8fe02feb5fa
2020-08-02T05:58:52.1372504Z                                at src/libstd/sys_common/backtrace.rs:62
2020-08-02T05:58:52.1373628Z    7:     0x55ee97a8da10 - std::sys_common::backtrace::print::h801b12991252ba7c
2020-08-02T05:58:52.1374488Z                                at src/libstd/sys_common/backtrace.rs:49
2020-08-02T05:58:52.1375698Z    8:     0x55ee97a8da10 - std::panicking::default_hook::{{closure}}::h25fc1fbf3b63b5c8
2020-08-02T05:58:52.1376078Z                                at src/libstd/panicking.rs:198
2020-08-02T05:58:52.1376842Z    9:     0x55ee97a8d75c - std::panicking::default_hook::h62c897957a5e0f26
2020-08-02T05:58:52.1377185Z                                at src/libstd/panicking.rs:217
2020-08-02T05:58:52.1378268Z   10:     0x55ee97a8e053 - std::panicking::rust_panic_with_hook::hb8a276f163c59810
2020-08-02T05:58:52.1387914Z                                at src/libstd/panicking.rs:526
2020-08-02T05:58:52.1389070Z   11:     0x55ee97a8dc4b - rust_begin_unwind
2020-08-02T05:58:52.1389536Z                                at src/libstd/panicking.rs:437
2020-08-02T05:58:52.1390265Z   12:     0x55ee97ab17c1 - core::panicking::panic_fmt::h9cc57011b345cfad
2020-08-02T05:58:52.1390591Z                                at src/libcore/panicking.rs:85
2020-08-02T05:58:52.1391254Z   13:     0x55ee97ab15e3 - core::option::expect_none_failed::h10abb5a6aef32df8
2020-08-02T05:58:52.1391557Z                                at src/libcore/option.rs:1273
2020-08-02T05:58:52.1492464Z   14:     0x55ee97a66a1e - core::result::Result<T,E>::unwrap::hf701633582f46c23
2020-08-02T05:58:52.1494631Z                                at /rustc/d6953df14657f5932270ad2b33bccafe6f39fad4/src/libcore/result.rs:1005
2020-08-02T05:58:52.1495684Z   15:     0x55ee97a66a1e - test::run_test::run_test_inner::hb4e0cf3cefb35bf3
2020-08-02T05:58:52.1496425Z                                at src/libtest/lib.rs:473
2020-08-02T05:58:52.1497186Z   16:     0x55ee97a6480e - test::run_test::h51caf8e89b554a56
2020-08-02T05:58:52.1497621Z                                at src/libtest/lib.rs:505
2020-08-02T05:58:52.1498386Z   17:     0x55ee97a526da - test::run_tests::h81fcb787d3f44144
2020-08-02T05:58:52.1499208Z                                at src/libtest/lib.rs:299
2020-08-02T05:58:52.1500080Z   18:     0x55ee97a526da - test::console::run_tests_console::h93fbaddef781791c
2020-08-02T05:58:52.1500718Z                                at src/libtest/console.rs:280
2020-08-02T05:58:52.1501881Z   19:     0x55ee97a60ada - test::test_main::heb8bd877a723c55d
2020-08-02T05:58:52.1515862Z                                at src/libtest/lib.rs:120
2020-08-02T05:58:52.1517005Z   20:     0x55ee97a6200d - test::test_main_static::h05dbfcdf8166a1c5
2020-08-02T05:58:52.1517464Z                                at src/libtest/lib.rs:139
2020-08-02T05:58:52.1518205Z   21:     0x55ee97a2bd26 - hut::main::hd669fce343160431
2020-08-02T05:58:52.1519005Z   22:     0x55ee97a2598b - std::rt::lang_start::{{closure}}::h6e6e62a5244ee3dd
2020-08-02T05:58:52.1519440Z                                at /rustc/d6953df14657f5932270ad2b33bccafe6f39fad4/src/libstd/rt.rs:67
2020-08-02T05:58:52.1520218Z   23:     0x55ee97a8e423 - std::rt::lang_start_internal::{{closure}}::hbd178e645b70b347
2020-08-02T05:58:52.1520858Z                                at src/libstd/rt.rs:52
2020-08-02T05:58:52.1521694Z   24:     0x55ee97a8e423 - std::panicking::try::do_call::hd9e76f93421bce23
2020-08-02T05:58:52.1522165Z                                at src/libstd/panicking.rs:348
2020-08-02T05:58:52.1522995Z   25:     0x55ee97a8e423 - std::panicking::try::h6776eea046a81bd7
2020-08-02T05:58:52.1523461Z                                at src/libstd/panicking.rs:325
2020-08-02T05:58:52.1524447Z   26:     0x55ee97a8e423 - std::panic::catch_unwind::hd9dfb4dd4c6fb7d1
2020-08-02T05:58:52.1524884Z                                at src/libstd/panic.rs:394
2020-08-02T05:58:52.1526010Z   27:     0x55ee97a8e423 - std::rt::lang_start_internal::h47278b515c002423
2020-08-02T05:58:52.1526433Z                                at src/libstd/rt.rs:51
2020-08-02T05:58:52.1527167Z   28:     0x55ee97a25967 - std::rt::lang_start::h0f0f6db08b6fd5bc
2020-08-02T05:58:52.1527623Z                                at /rustc/d6953df14657f5932270ad2b33bccafe6f39fad4/src/libstd/rt.rs:67
2020-08-02T05:58:52.1528408Z   29:     0x55ee97a2bd5a - main
2020-08-02T05:58:52.1529222Z   30:     0x7fd5af538b97 - __libc_start_main
2020-08-02T05:58:52.1530062Z   31:     0x55ee97a1954a - _start
2020-08-02T05:58:52.1530667Z   32:                0x0 - <unknown>
2020-08-02T05:58:52.1532853Z error: test failed, to rerun pass '--lib'
2020-08-02T05:58:52.1557798Z ##[error]The process '/usr/share/rust/.cargo/bin/cargo' failed with exit code 101
2020-08-02T05:58:52.1633132Z Post job cleanup.
2020-08-02T05:58:52.2761660Z [command]/usr/bin/git version
2020-08-02T05:58:52.2825424Z git version 2.27.0
2020-08-02T05:58:52.2868383Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2020-08-02T05:58:52.2905666Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2020-08-02T05:58:52.3185655Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2020-08-02T05:58:52.3218314Z http.https://github.com/.extraheader
2020-08-02T05:58:52.3228570Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2020-08-02T05:58:52.3268023Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
2020-08-02T05:58:52.3586561Z Cleaning up orphan processes

@camelid camelid added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 20, 2020
@jyn514
Copy link
Member

jyn514 commented Oct 20, 2020

There are three separate issues mentioned here:

  1. People write programs that doesn't handle running out of threads. The rust project can't do anything about this; you have to decide the right thing to do on errors.
  2. libtest panics when it runs out of threads. This is tracked by rustc panics when it can't spawn a thread #72482 and I'd request more discussion about that move there.
  3. The error returned by spawn is shown as 'WouldBlock' in Debug output, instead of EAGAIN.

I would like to have this issue only track the third problem.

@camelid camelid changed the title thread::Builder::spawn returns WouldBlock thread::Builder::spawn returns WouldBlock for EAGAIN Oct 20, 2020
@camelid
Copy link
Member

camelid commented Oct 20, 2020

Renamed issue to reflect (3).

@apiraino apiraino added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 21, 2020
@apiraino
Copy link
Contributor

Assigning P-low as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 20, 2022

EAGAIN and EWOULDBLOCK differ only on Windows, Redox, and VxWorks, according to our libc crate. We could add an io::Error::TryAgain variant and make sure we return that, but is it worth changing that return value and potentially confounding users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-linux Operating system: Linux P-low Low priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests