-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to shutdown neard cleanly #3266
Comments
We need to use this new feature: rust-rocksdb/rust-rocksdb#459 |
In #4229 we implemented ctrl+c handler for the tests infrastructure, though we clearly did not gracefully shutdown RocksDB, yet it might be still useful to take some inspiration from there. |
Stop the system once a SIGINT is received. This should allow for graceful termination since System::stop will stop all the arbiters and that in turn will stop all the actors (leading them through stopping and stopped states thus allowing all the necessary cleanups). Fixes: near#3266
Per RocksDB FAQ: > Q: Is it safe to close RocksDB while another thread is issuing read, > write or manual compaction requests? > A: No. The users of RocksDB need to make sure all functions have > finished before they close RocksDB. You can speed up the waiting > by calling CancelAllBackgroundWork(). Better be safe than sorry so add the call before the rocksdb::DB object is dropped. Issue: near#3266
Issue here is that wasmer 0.17.1 is catching SIGINT and interferes with us trying to catch it as well. This was fixed upstream but our fork still does that. All in all, this is now blocked on near/wasmer#38 which fixes this in our fork as well. |
cc @matklad |
This is the same issue I encountered on #4229, I think, and I may be wrong, that the issue is more likely caused by us having long blocking tasks on the active thread that it only occasionally catches the signal. The workaround was to take a more direct approach by isolating the listener on a dedicated thread and proceeding to alert the tasks that depend on it when it's caught. https://github.com/near/nearcore/pull/4229/files#diff-c4d3d011b8925d7128b2a5779e866237fa4627a9655c60ff3ce01d7f37d8bdae |
Per RocksDB FAQ: > Q: Is it safe to close RocksDB while another thread is issuing read, > write or manual compaction requests? > A: No. The users of RocksDB need to make sure all functions have > finished before they close RocksDB. You can speed up the waiting > by calling CancelAllBackgroundWork(). Better be safe than sorry so add the call before the rocksdb::DB object is dropped. Issue: near#3266
Per RocksDB FAQ: > Q: Is it safe to close RocksDB while another thread is issuing read, > write or manual compaction requests? > A: No. The users of RocksDB need to make sure all functions have > finished before they close RocksDB. You can speed up the waiting > by calling CancelAllBackgroundWork(). Better be safe than sorry so add the call before the rocksdb::DB object is dropped. Issue: near#3266
Stop the system once a SIGINT is received. This should allow for graceful termination since System::stop will stop all the arbiters and that in turn will stop all the actors (leading them through stopping and stopped states thus allowing all the necessary cleanups). To achieve this, also update uptade wasmer-runtime-core dependency to 0.17.4. Among other things, the new version no longer catches the INT signal making it available for tokie to handle. Issue: near#3266
The way remove `near_actix_test_utils::run_actix_until` was written, the expect_panic flag didn’t actually matter: SET_PANIC_HOOK.call_once(|| { let default_hook = std::panic::take_hook(); std::panic::set_hook(Box::new(move |info| { if !expect_panic { default_hook(info); } // ... })); }); Since `SET_PANIC_HOOK.call_once` invokes the closure only once, the value of expect_panic when that calls happen is the only one that matters. In other words, the first run of `run_actix_until` function decides what the value of `expect_panic` in the panic handler is. Fortunately this didn’t actually matter. The only test which set the flag to true – `chunks_recovered_from_full_timeout_too_short` – was marked `#[should_panic]` and running the default panic hook didn’t negatively influence the test. As such, get rid of `run_actix_until_panic` and rename `run_actix_until_stop` to simply be `run_actix`. Issue: near#3266
Stop the system once a SIGINT is received. This should allow for graceful termination since System::stop will stop all the arbiters and that in turn will stop all the actors (leading them through stopping and stopped states thus allowing all the necessary cleanups). To achieve this, also update uptade wasmer-runtime-core dependency to 0.17.4. Among other things, the new version no longer catches the INT signal making it available for tokie to handle. Issue: #3266
The default signal ‘kill’ command sends is SIGTERM so catch it in addition ta SIGINT when running under Unix-like system. Issue: near#3266
The default signal ‘kill’ command sends is SIGTERM so catch it in addition ta SIGINT when running under Unix-like system. Issue: near#3266
The process will now gracefully handle SIGINT (i.e. ^C) and SIGTERM (i.e. default signal used by
|
The way remove `near_actix_test_utils::run_actix_until` was written, the expect_panic flag didn’t actually matter: SET_PANIC_HOOK.call_once(|| { let default_hook = std::panic::take_hook(); std::panic::set_hook(Box::new(move |info| { if !expect_panic { default_hook(info); } // ... })); }); Since `SET_PANIC_HOOK.call_once` invokes the closure only once, the value of expect_panic when that calls happen is the only one that matters. In other words, the first run of `run_actix_until` function decides what the value of `expect_panic` in the panic handler is. Fortunately this didn’t actually matter. The only test which set the flag to true – `chunks_recovered_from_full_timeout_too_short` – was marked `#[should_panic]` and running the default panic hook didn’t negatively influence the test. As such, get rid of `run_actix_until_panic` and rename `run_actix_until_stop` to simply be `run_actix`. Issue: #3266
…4429) Per RocksDB FAQ: > Q: Is it safe to close RocksDB while another thread is issuing read, > write or manual compaction requests? > A: No. The users of RocksDB need to make sure all functions have > finished before they close RocksDB. You can speed up the waiting > by calling CancelAllBackgroundWork(). Better be safe than sorry so add the call before the rocksdb::DB object is dropped. Fixes: #3266
@mina86 looks like this is still not fixed. Today I shutdown a node that is running 3216f6e and tried to load it from state-viewer and got
|
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. |
I’m going to close this in favour of #5340 |
Neard should shutdown cleanly. More specifically,
The text was updated successfully, but these errors were encountered: