Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupt handler does not work when a blocking task is running #1351

Closed
Tracked by #3322 ...
hdevalence opened this issue Nov 22, 2020 · 2 comments
Closed
Tracked by #3322 ...

Interrupt handler does not work when a blocking task is running #1351

hdevalence opened this issue Nov 22, 2020 · 2 comments
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness NU-5 Network Upgrade: NU5 specific tasks

Comments

@hdevalence
Copy link
Contributor

hdevalence commented Nov 22, 2020

Scheduling

This usability issue is acceptable for the first stable release, but we should review for lightwalletd.

Version

main

Description

The interrupt handler does not work when the application is busy (i.e., when it is probably needed). This is because the interrupt handler shares priority with the main application future, and futures do cooperative multitasking:

tokio::select! {
result = fut => result,
_ = shutdown() => Ok(()),
}

Tasks

  1. Wait on signals in a separate thread:
  • the main application future should be spawned using spawn, and
  • the interrupt handler future should use spawn.
  1. Then, the select statement should wait on a JoinHandle for the spawned tasks.

We use spawn for the interrupt future so it can be scheduled on any thread. This ensures that the shutdown task will run, even if there are long-running or blocking tasks in other futures.

Alternatives

Get signals as a stream: https://docs.rs/signal-hook-tokio/0.3.0/signal_hook_tokio/

Related Issues

Design and implement graceful shutdown for Zebra #1678

@hdevalence hdevalence added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage labels Nov 22, 2020
@mpguerra mpguerra added this to the v1.0.0-alpha.1 milestone Dec 9, 2020
@mpguerra mpguerra removed this from the v1.0.0-alpha.1 milestone Jan 11, 2021
@teor2345 teor2345 changed the title Interrupt handler does not work Interrupt handler does not work when a blocking task is running Jan 24, 2021
@teor2345 teor2345 added A-rust Area: Updates to Rust code P-Medium and removed S-needs-triage Status: A bug report needs triage labels Jan 29, 2021
@teor2345
Copy link
Contributor

teor2345 commented Jan 29, 2021

I've observed a complete hang on Ctrl-C (SIGINT) on Linux, when the sync service was waiting to restart the sync:

Jan 29 13:10:41.867  INFO {zebrad="df399108" net="Test"}:sync: zebrad::components::sync: waiting to restart sync timeout=61s
Jan 29 13:10:43.787  INFO {zebrad="df399108" net="Test"}:peer{addr=127.0.0.1:38233}:msg_as_req{msg=getblocks}:state: zebra_state::service: responding to peer GetBlocks or GetHeaders final_height=Height(1263147) response_len=0 chain_tip_height=Height(1263147) stop_height=None intersection_height=Some(Height(1263147))
Jan 29 13:11:42.868  INFO {zebrad="df399108" net="Test"}:sync: zebrad::components::sync: starting sync, obtaining new tips
Jan 29 13:11:42.868  INFO {zebrad="df399108" net="Test"}:sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(1263147) min_locator_height=1263048 locators=[Height(1263147), Height(1263146), Height(1263145), Height(1263143), Height(1263139), Height(1263131), Height(1263115), Height(1263083), Height(1263048)]
Jan 29 13:11:42.868  INFO {zebrad="df399108" net="Test"}:sync:obtain_tips: zebra_network::peer_set::set: network request with no ready peers: finding more peers, waiting for 1 peers to answer requests
Jan 29 13:11:42.869  INFO {zebrad="df399108" net="Test"}:sync: zebrad::components::sync: exhausted prospective tip set
Jan 29 13:11:42.869  INFO {zebrad="df399108" net="Test"}:sync: zebrad::components::sync: waiting to restart sync timeout=61s
Jan 29 13:11:44.788  INFO {zebrad="df399108" net="Test"}:peer{addr=127.0.0.1:38233}:msg_as_req{msg=getblocks}:state: zebra_state::service: responding to peer GetBlocks or GetHeaders final_height=Height(1263147) response_len=0 chain_tip_height=Height(1263147) stop_height=None intersection_height=Some(Height(1263147))
^C^C^C

zebrad did not respond within 5 minutes, even though the sync timer was 61 seconds. Perhaps it's blocked on a network task (#1633 or similar).

Backgrounding the task using Ctrl-Z and killing it using kill %1 made it terminate.

Marking this usability issue as medium, because we've seen it in practice, but it seems to be rare.

@teor2345
Copy link
Contributor

teor2345 commented Jun 2, 2022

We've improved this a lot, and some of the blocking is meant to happen (like database cleanup or ephemeral database deletion).

@teor2345 teor2345 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness NU-5 Network Upgrade: NU5 specific tasks
Projects
None yet
Development

No branches or pull requests

3 participants