Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when syncing on macOS due to open file limit #1415

Closed
rex4539 opened this issue Dec 1, 2020 · 5 comments · Fixed by #1426
Closed

Crash when syncing on macOS due to open file limit #1415

rex4539 opened this issue Dec 1, 2020 · 5 comments · Fixed by #1426
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug

Comments

@rex4539
Copy link
Contributor

rex4539 commented Dec 1, 2020

Error

Verified checkpoints must be committed transactionally: Error { message: "IO error: While open a file for random read: /Users/rex/Library/Caches/zebra/state/v4/mainnet/000682.sst: Too many open files" }

Metadata

key value
version 3.0.0-alpha.0
location /Users/rex/zebra/zebra-consensus/src/checkpoint.rs:889:18

Backtrace

Backtrace:
   0: backtrace::backtrace::libunwind::trace
             at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.54/src/backtrace/libunwind.rs:90:5
      backtrace::backtrace::trace_unsynchronized
             at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.54/src/backtrace/mod.rs:66:5
   1: backtrace::backtrace::trace
             at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.54/src/backtrace/mod.rs:53:14
   2: backtrace::capture::Backtrace::create
             at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.54/src/capture.rs:176:9
   3: backtrace::capture::Backtrace::new
             at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.54/src/capture.rs:140:22
   4: color_eyre::config::PanicHook::panic_report
             at .cargo/registry/src/github.com-1ecc6299db9ec823/color-eyre-0.5.8/src/config.rs:773:18
   5: color_eyre::config::PanicHook::into_panic_hook::{{closure}}
             at .cargo/registry/src/github.com-1ecc6299db9ec823/color-eyre-0.5.8/src/config.rs:752:29
   6: std::panicking::rust_panic_with_hook
             at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/std/src/panicking.rs:595:17
   7: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/std/src/panicking.rs:497:13
   8: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/std/src/sys_common/backtrace.rs:141:18
   9: rust_begin_unwind
             at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/std/src/panicking.rs:493:5
  10: core::panicking::panic_fmt
             at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/core/src/panicking.rs:92:14
  11: core::option::expect_none_failed
             at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/core/src/option.rs:1268:5
  12: core::result::Result<T,E>::expect
             at .rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:933:23

@rex4539
Copy link
Contributor Author

rex4539 commented Dec 1, 2020

/Users/rex  rex@MacBook-Pro-2018% ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       3750
-n: file descriptors                2560

@teor2345 teor2345 added A-rust Area: Updates to Rust code C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage labels Dec 1, 2020
@teor2345 teor2345 added this to the First Alpha Release milestone Dec 1, 2020
@teor2345
Copy link
Contributor

teor2345 commented Dec 1, 2020

@rex4539 thanks for this bug report!

File limits are per-user, not per-process, so this crash might depend on the other processes you're running. Can you reproduce in a dedicated user account?

Edit: file limits are per process:
https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html

@teor2345 teor2345 changed the title Crash when syncing on macOS Crash when syncing on macOS due to open file limit Dec 1, 2020
@teor2345
Copy link
Contributor

teor2345 commented Dec 1, 2020

We probably want to set max_open_files in the RocksDB config - or provide a way for users to set it.
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#general-options

The default value is -1, which means "unlimited":
https://docs.rs/rocksdb/0.15.0/rocksdb/struct.Options.html#method.set_max_open_files

Unix-based OS user defaults vary from 128-1024:
Linux 1024: https://serverfault.com/questions/356962/where-are-the-default-ulimit-values-set-linux-centos/485277#485277
macOS 256: https://ss64.com/osx/ulimit.html
FreeBSD 128: https://measureofchaos.wordpress.com/2011/07/27/openbsd-file-descriptor-limits/

We might want to set the limit dynamically for OSes with smaller limits, up to a predefined maximum.

@teor2345
Copy link
Contributor

teor2345 commented Dec 2, 2020

@rex4539 we think we've come up with an automatic solution to this issue, that should work on all unix-based OSes (including macOS).

Feel free to test or review PR #1246 or wait until we've reviewed it and merged it to main over the next day or two.

Edit: we've merged #1426

@teor2345
Copy link
Contributor

teor2345 commented Dec 2, 2020

We've merged #1426, please pull the latest main branch, and let us know how you go.

Feel free to re-open this issue if it happens again, and we'll tweak the limits.

@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants