-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue(smp): tokio-rs/sled deadlock (probable) on cpu count > 1 #1930
Comments
I couldn't reproduce this issue: OpenObserve v0.5.2 (downloaded from https://github.com/openobserve/openobserve/releases) runs fine with
I also tried with the nightly kernel build ( |
fyi, i just made a pkg of openobserve last week - https://repo.ops.city/v2/packages/eyberg/openobserve/0.5.1/x86_64/show - it seems to boot fine for me on latest && also nightly - can you post a '--trace' ? |
Here is the trace of one run with clean storage that deadlocks.
|
Starting with clean storage deadlocks some time, other times not. While restarting/rebooting with an already existing storage deadlocks all the time (at least on my side). I'm starting to think that it might not be related to tokio per se but with sled since I have other issues on some other tests/apps involving sled and tokio (have yet to investigate and gather definitive results but it fails/deadlocks on some stuff related to sled pagecache and |
On a side note how does nanos handle |
we don't support that today as it typically implies multiple processes which we won't support; there have a been a few cases though where this gets used in a single process env for other reasons, which after a quick grep looks like that's what is going on here looks like it is configurable (just by setting a path?) in https://github.com/spacejam/sled/blob/005c023ca94d424d8e630125e4c21320ed160031/src/config.rs#L414 |
yes it's configurable when needed, but in this case it's not being used at all. The purpose of that path is for temporary inmem sled default db https://github.com/spacejam/sled/blob/005c023ca94d424d8e630125e4c21320ed160031/src/config.rs#L253 and that's not the case. |
This is mostly related to sled behavior under nanos. While the sled version used here is old (although that's the latest stable crate released) and known to have some issues with async and thread pools under certain conditions, the idea here was to check that nanos is behaving correctly (not openobserve or sled). I'll close this issue since I'm unable to provide reproducible code that points to nanos issue and not program issue. |
As an example OpenObserve can't boot properly on cpu count > 1 (it keeps all vCpu cores/threads at 100%).
Tested also some internal programs that use similar tech stack (tokio, sled, ..) and the behavior is the same.
Will investigate further and provide more info when/if available.
ops
commands and configThe text was updated successfully, but these errors were encountered: