Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokio::task::spawn_blocking panics when exceeding the thread limit #2309

Closed
Nemo157 opened this issue Mar 9, 2020 · 7 comments · Fixed by #4485
Closed

tokio::task::spawn_blocking panics when exceeding the thread limit #2309

Nemo157 opened this issue Mar 9, 2020 · 7 comments · Fixed by #4485
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. I-crash Problems and improvements related to program crashes/panics. M-task Module: tokio/task

Comments

@Nemo157
Copy link
Contributor

Nemo157 commented Mar 9, 2020

If a Tokio application is running within an environment with a limited number of processes/threads (e.g. somewhere like the playground that uses cgroup limiting to a max of 512 processes) it will panic if this limit is below maximum number of blocking tasks

Repro: run this example on the playground or within a shell with ulimit -u 512 applied

#[tokio::main]
async fn main() {
    for i in 0..1024 {
        eprintln!("Running {}", i);
        tokio::task::spawn_blocking(|| std::thread::sleep(std::time::Duration::from_secs(1)));
    }
}

logs

Runinng 1
[...]
Running 508
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/libcore/result.rs:1188:5
backtrace
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:84
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1025
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:65
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:50
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:193
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:210
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:471
  11: rust_begin_unwind
             at src/libstd/panicking.rs:375
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:84
  13: core::result::unwrap_failed
             at src/libcore/result.rs:1188
  14: core::result::Result<T,E>::unwrap
             at /rustc/f3e1a954d2ead4e2fc197c7da7d71e6c61bad196/src/libcore/result.rs:956
  15: tokio::runtime::blocking::pool::Spawner::spawn_thread
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/blocking/pool.rs:205
  16: tokio::runtime::blocking::pool::Spawner::spawn
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/blocking/pool.rs:190
  17: tokio::runtime::blocking::pool::spawn_blocking
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/blocking/pool.rs:68
  18: tokio::task::blocking::spawn_blocking
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/blocking.rs:69
  19: playground::main::{{closure}}
             at src/main.rs:5
  20: <std::future::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/f3e1a954d2ead4e2fc197c7da7d71e6c61bad196/src/libstd/future.rs:43
  21: tokio::runtime::enter::Enter::block_on
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/enter.rs:101
  22: tokio::runtime::thread_pool::ThreadPool::block_on
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/thread_pool/mod.rs:93
  23: tokio::runtime::Runtime::block_on::{{closure}}
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/mod.rs:415
  24: tokio::runtime::context::enter
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/context.rs:72
  25: tokio::runtime::handle::Handle::enter
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/handle.rs:34
  26: tokio::runtime::Runtime::block_on
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/mod.rs:410
  27: playground::main
             at src/main.rs:1
  28: std::rt::lang_start::{{closure}}
             at /rustc/f3e1a954d2ead4e2fc197c7da7d71e6c61bad196/src/libstd/rt.rs:67
  29: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
  30: std::panicking::try::do_call
             at src/libstd/panicking.rs:292
  31: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:78
  32: std::panicking::try
             at src/libstd/panicking.rs:270
  33: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  34: std::rt::lang_start_internal
             at src/libstd/rt.rs:51
  35: std::rt::lang_start
             at /rustc/f3e1a954d2ead4e2fc197c7da7d71e6c61bad196/src/libstd/rt.rs:67
  36: main
  37: __libc_start_main
  38: _start

Originally posted by @Nemo157 in #2143 (comment)

@Matthias247
Copy link
Contributor

If the OS limit is readable we could set the max threadpool size to min(OS_LIMIT, THREADPOOL_SIZE). That would queue the task. However that doesn't account for the fact that you likely can change the OS limit during runtime of a process.

Returning an error is rather ugly. I would rather keep the current way, and ask users to increase limits. I think users might already get other panics if opening objects above limits (e.g files).

@Darksonn Darksonn added A-tokio Area: The main tokio crate C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. I-crash Problems and improvements related to program crashes/panics. M-task Module: tokio/task labels Apr 20, 2020
@Darksonn
Copy link
Contributor

What is the status on this?

@kayru
Copy link

kayru commented Jun 22, 2021

I've hit this issue recently in a production environment (application running in a docker container, based on very vanilla alpine:latest). The OS thread limit seemingly was quite high. We also limit the number of blocking threads to a pretty low number (12 in this specific case). The unwrap panic here is very unfortunate.

Executing the task on an already-existing blocking thread is a potential solution sometimes, but I can definitely see how it might lead to deadlocks in pathological cases. In my particular application, I am actually able to handle a failure gracefully and so a guaranteed non-panicing variant of blocking task spawn API would be welcome.

A file handle limit example mentioned in this thread isn't quite the same as this panic, since nothing unwraps the file open result.

What makes this issue particularly annoying is that it can happen quite sporadically and it's hard to catch it in the act to really diagnose things. One potential workaround that I have considered was to just force spawn all the blocking worker threads on startup and turn off the timeout. This may be a reasonable thing to do for 12 thread case, but not so much for default 512 max :)

@rapiz1
Copy link

rapiz1 commented Jan 25, 2022

I've received users' feedback hitting this issue. The thread limit is high and unlikely to be reached. However the panic still occurs

@rapiz1
Copy link

rapiz1 commented Jan 25, 2022

After some digging, this boils down to pthread_create on unix. https://github.com/rust-lang/rust/blob/e7825f2b690c9a0d21b6f6d84c404bb53b151b38/library/std/src/sys/unix/thread.rs#L87

From pthread_create's man page:

ERRORS
       EAGAIN Insufficient resources to create another thread.

       EAGAIN A  system-imposed  limit on the number of threads was encountered.  There are a number
              of limits that may trigger this error: the RLIMIT_NPROC soft resource limit  (set  via
              setrlimit(2)),  which  limits  the number of processes and threads for a real user ID,
              was reached; the kernel's system-wide limit on the number of  processes  and  threads,
              /proc/sys/kernel/threads-max,  was  reached  (see  proc(5));  or the maximum number of
              PIDs, /proc/sys/kernel/pid_max, was reached (see proc(5)).

So there's something wrong with the system resource and it's not tokio's fault.

Maybe one way forward is to create a spawn_blocking version that returns a Result and doesn't panic.

@Darksonn
Copy link
Contributor

Well, spawn_blocking uses a thread pool. If it is unable to spawn more threads, it could just keep using the ones it has.

Of course if it's unable to spawn even on spawn_blocking thread, then there's a problem.

@gwik
Copy link
Contributor

gwik commented Feb 9, 2022

Well, spawn_blocking uses a thread pool. If it is unable to spawn more threads, it could just keep using the ones it has.

I started playing around this idea.

gwik added a commit to gwik/tokio that referenced this issue Feb 9, 2022
Avoid panicking when the OS reaches the limit of the number of
threads / processes and the error is temporary.

Spawning a new thread is not mandatory to make progress as
long as there is a least one thread in the pool already processing
the task queue.

Fixes: tokio-rs#2309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. I-crash Problems and improvements related to program crashes/panics. M-task Module: tokio/task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants