-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refact: nydusd's state machine #439
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -245,6 +245,23 @@ impl EndpointHandler for MountHandler { | |
} | ||
} | ||
|
||
pub struct StartHandler {} | ||
impl EndpointHandler for StartHandler { | ||
fn handle_request( | ||
&self, | ||
req: &Request, | ||
kicker: &dyn Fn(ApiRequest) -> ApiResponse, | ||
) -> HttpResult { | ||
match (req.method(), req.body.as_ref()) { | ||
(Method::Put, None) => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's expected that daemon in READY is trigger to RUNNING. Just like takeover endpoint with PUT method, it's required for daemon in certain status. |
||
let r = kicker(ApiRequest::Start); | ||
Ok(convert_to_response(r, HttpError::Upgrade)) | ||
} | ||
_ => Err(HttpError::BadRequest), | ||
} | ||
} | ||
} | ||
|
||
pub struct MetricsInflightHandler {} | ||
impl EndpointHandler for MetricsInflightHandler { | ||
fn handle_request( | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,10 +35,9 @@ use crate::upgrade::UpgradeMgrError; | |
pub enum DaemonState { | ||
INIT = 1, | ||
RUNNING = 2, | ||
UPGRADING = 3, | ||
INTERRUPTED = 4, | ||
STOPPED = 5, | ||
UNKNOWN = 6, | ||
READY = 3, | ||
STOPPED = 4, | ||
UNKNOWN = 5, | ||
} | ||
|
||
impl Display for DaemonState { | ||
|
@@ -52,9 +51,8 @@ impl From<i32> for DaemonState { | |
match i { | ||
1 => DaemonState::INIT, | ||
2 => DaemonState::RUNNING, | ||
3 => DaemonState::UPGRADING, | ||
4 => DaemonState::INTERRUPTED, | ||
5 => DaemonState::STOPPED, | ||
3 => DaemonState::READY, | ||
4 => DaemonState::STOPPED, | ||
_ => DaemonState::UNKNOWN, | ||
} | ||
} | ||
|
@@ -196,23 +194,49 @@ pub trait NydusDaemon: DaemonStateMachineSubscriber + Send + Sync { | |
fn interrupt(&self) {} | ||
fn stop(&self) -> DaemonResult<()> { | ||
let s = self.get_state(); | ||
if s != DaemonState::INTERRUPTED && s != DaemonState::STOPPED { | ||
return self.on_event(DaemonStateMachineInput::Stop); | ||
|
||
if s == DaemonState::STOPPED { | ||
return Ok(()); | ||
} | ||
Ok(()) | ||
|
||
if s == DaemonState::RUNNING { | ||
self.on_event(DaemonStateMachineInput::Stop)?; | ||
} | ||
|
||
self.on_event(DaemonStateMachineInput::Stop) | ||
} | ||
fn wait(&self) -> DaemonResult<()>; | ||
fn wait_service(&self) -> DaemonResult<()> { | ||
Ok(()) | ||
} | ||
fn wait_state_machine(&self) -> DaemonResult<()> { | ||
Ok(()) | ||
} | ||
fn trigger_exit(&self) -> DaemonResult<()> { | ||
let s = self.get_state(); | ||
|
||
if s == DaemonState::STOPPED { | ||
return Ok(()); | ||
} | ||
|
||
if s == DaemonState::INIT { | ||
return self.on_event(DaemonStateMachineInput::Stop); | ||
} | ||
|
||
if s == DaemonState::RUNNING { | ||
self.on_event(DaemonStateMachineInput::Stop)?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not merge the two events handling if blocks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The |
||
} | ||
|
||
self.on_event(DaemonStateMachineInput::Exit) | ||
} | ||
|
||
fn supervisor(&self) -> Option<String>; | ||
fn save(&self) -> DaemonResult<()>; | ||
fn restore(&self) -> DaemonResult<()>; | ||
fn trigger_takeover(&self) -> DaemonResult<()> { | ||
self.on_event(DaemonStateMachineInput::Takeover)?; | ||
self.on_event(DaemonStateMachineInput::Successful)?; | ||
Ok(()) | ||
self.on_event(DaemonStateMachineInput::Takeover) | ||
} | ||
fn trigger_start(&self) -> DaemonResult<()> { | ||
self.on_event(DaemonStateMachineInput::Start) | ||
} | ||
|
||
// For backward compatibility. | ||
|
@@ -225,35 +249,28 @@ pub trait NydusDaemon: DaemonStateMachineSubscriber + Send + Sync { | |
// - `Init` means nydusd is just started and potentially configured well but not | ||
// yet negotiate with kernel the capabilities of both sides. It even does not try | ||
// to set up fuse session by mounting `/fuse/dev`(in case of `fusedev` backend). | ||
// - `Ready` means nydusd is ready for start or die. Fuse session is created. | ||
// - `Running` means nydusd has successfully prepared all the stuff needed to work as a | ||
// user-space fuse filesystem, however, the essential capabilities negotiation might not be | ||
// done yet. It relies on `fuse-rs` to tell if capability negotiation is done. | ||
// - `Upgrading` state means the nydus daemon is being live-upgraded. There's no need | ||
// to do kernel mount again to set up a session but try to reuse a fuse fd from somewhere else. | ||
// In this state, we try to push `Successful` event to state machine to trigger state transition. | ||
// - `Interrupted` state means nydusd has shutdown fuse server, which means no more message will | ||
// be read from kernel and handled and no pending and in-flight fuse message exists. But the | ||
// nydusd daemon should be alive and wait for coming events. | ||
// - `Die` state means the whole nydusd process is going to die. | ||
state_machine! { | ||
derive(Debug, Clone) | ||
pub DaemonStateMachine(Init) | ||
|
||
// FIXME: It's possible that failover does not succeed or resource is not capable to | ||
// be passed. To handle event `Stop` when being `Init`. | ||
Init => { | ||
Start => Running [StartService], | ||
Takeover => Upgrading [Restore], | ||
Exit => Die[StopStateMachine], | ||
Mount => Ready, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems we no longer need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In fact, we shall do |
||
Takeover => Ready[Restore], | ||
Stop => Die[StopStateMachine], | ||
}, | ||
Ready => { | ||
Start => Running[StartService], | ||
Stop => Die[Umount], | ||
Exit => Die[StopStateMachine], | ||
}, | ||
Running => { | ||
Exit => Interrupted [TerminateService], | ||
Stop => Die[Umount], | ||
Stop => Ready [TerminateService], | ||
}, | ||
Upgrading(Successful) => Running [StartService], | ||
// Quit from daemon but not disconnect from fuse front-end. | ||
Interrupted(Stop) => Die[StopStateMachine], | ||
} | ||
|
||
/// Implementation of the state machine defined by `DaemonStateMachine`. | ||
|
@@ -322,20 +339,30 @@ impl DaemonStateMachineContext { | |
}), | ||
TerminateService => { | ||
d.interrupt(); | ||
d.set_state(DaemonState::INTERRUPTED); | ||
Ok(()) | ||
let res = d.wait_service(); | ||
if res.is_ok() { | ||
d.set_state(DaemonState::READY); | ||
} | ||
|
||
res | ||
} | ||
Umount => d.disconnect().map(|r| { | ||
// Always interrupt fuse service loop after shutdown connection to kernel. | ||
// In case that kernel does not really shutdown the session due to some reasons | ||
// causing service loop keep waiting of `/dev/fuse`. | ||
d.interrupt(); | ||
d.wait_service() | ||
.unwrap_or_else(|e| error!("failed to wait service {}", e)); | ||
// at least all fuse thread stopped, no matter what error each thread got | ||
d.set_state(DaemonState::STOPPED); | ||
r | ||
}), | ||
Restore => { | ||
d.set_state(DaemonState::UPGRADING); | ||
d.restore() | ||
let res = d.restore(); | ||
if res.is_ok() { | ||
d.set_state(DaemonState::READY); | ||
} | ||
res | ||
} | ||
StopStateMachine => { | ||
d.set_state(DaemonState::STOPPED); | ||
|
@@ -348,9 +375,7 @@ impl DaemonStateMachineContext { | |
// Safe to unwrap because channel is never closed | ||
self.result_sender.send(r).unwrap(); | ||
// Quit state machine thread if interrupted or stopped | ||
if d.get_state() == DaemonState::INTERRUPTED | ||
|| d.get_state() == DaemonState::STOPPED | ||
{ | ||
if d.get_state() == DaemonState::STOPPED { | ||
break; | ||
} | ||
} | ||
|
@@ -380,10 +405,19 @@ mod tests { | |
let stat = DaemonState::from(1); | ||
assert_eq!(stat, DaemonState::INIT); | ||
|
||
let stat = DaemonState::from(6); | ||
let stat = DaemonState::from(2); | ||
assert_eq!(stat, DaemonState::RUNNING); | ||
|
||
let stat = DaemonState::from(3); | ||
assert_eq!(stat, DaemonState::READY); | ||
|
||
let stat = DaemonState::from(4); | ||
assert_eq!(stat, DaemonState::STOPPED); | ||
|
||
let stat = DaemonState::from(5); | ||
assert_eq!(stat, DaemonState::UNKNOWN); | ||
|
||
let stat = DaemonState::from(7); | ||
let stat = DaemonState::from(8); | ||
assert_eq!(stat, DaemonState::UNKNOWN); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -246,7 +246,8 @@ pub struct FusedevDaemon { | |
state: AtomicI32, | ||
supervisor: Option<String>, | ||
threads_cnt: u32, | ||
threads: Mutex<Vec<JoinHandle<Result<()>>>>, | ||
state_machine_thread: Mutex<Option<JoinHandle<Result<()>>>>, | ||
fuse_service_threads: Mutex<Vec<JoinHandle<Result<()>>>>, | ||
} | ||
|
||
impl FusedevDaemon { | ||
|
@@ -256,14 +257,19 @@ impl FusedevDaemon { | |
let thread = thread::Builder::new() | ||
.name("fuse_server".to_string()) | ||
.spawn(move || { | ||
let _ = s.svc_loop(&inflight_op); | ||
if let Err(err) = s.svc_loop(&inflight_op) { | ||
warn!("fuse server exits with err: {:?}, exiting daemon", err); | ||
if let Err(err) = waker.wake() { | ||
error!("fail to exit daemon, error: {:?}", err); | ||
} | ||
} | ||
// Notify the daemon controller that one working thread has exited. | ||
let _ = waker.wake(); | ||
|
||
Ok(()) | ||
}) | ||
.map_err(DaemonError::ThreadSpawn)?; | ||
|
||
self.threads.lock().unwrap().push(thread); | ||
self.fuse_service_threads.lock().unwrap().push(thread); | ||
|
||
Ok(()) | ||
} | ||
|
@@ -338,8 +344,32 @@ impl NydusDaemon for FusedevDaemon { | |
} | ||
|
||
fn wait(&self) -> DaemonResult<()> { | ||
self.wait_state_machine()?; | ||
self.wait_service() | ||
} | ||
|
||
fn wait_state_machine(&self) -> DaemonResult<()> { | ||
let mut guard = self.state_machine_thread.lock().unwrap(); | ||
if guard.is_some() { | ||
guard | ||
.take() | ||
.unwrap() | ||
.join() | ||
.map_err(|e| { | ||
DaemonError::WaitDaemon( | ||
*e.downcast::<Error>() | ||
.unwrap_or_else(|e| Box::new(eother!(e))), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Um. Hard error conversion path to |
||
) | ||
})? | ||
.map_err(DaemonError::WaitDaemon)?; | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
fn wait_service(&self) -> DaemonResult<()> { | ||
loop { | ||
let handle = self.threads.lock().unwrap().pop(); | ||
let handle = self.fuse_service_threads.lock().unwrap().pop(); | ||
if let Some(handle) = handle { | ||
handle | ||
.join() | ||
|
@@ -508,11 +538,12 @@ pub fn create_fuse_daemon( | |
result_receiver: Mutex::new(result_receiver), | ||
request_sender: Arc::new(Mutex::new(trigger)), | ||
service: Arc::new(service), | ||
threads: Mutex::new(Vec::new()), | ||
state_machine_thread: Mutex::new(None), | ||
fuse_service_threads: Mutex::new(Vec::new()), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is reaping fuse threads important? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yep, seeking for waiting state machine and fuse service thread termination separately. Because daemon status is only updated after successful action. For example, |
||
}); | ||
let machine = DaemonStateMachineContext::new(daemon.clone(), events_rx, result_sender); | ||
let machine_thread = machine.kick_state_machine()?; | ||
daemon.threads.lock().unwrap().push(machine_thread); | ||
*daemon.state_machine_thread.lock().unwrap() = Some(machine_thread); | ||
|
||
// Without api socket, nydusd can't do neither live-upgrade nor failover, so the helper | ||
// finding a victim is not necessary. | ||
|
@@ -523,6 +554,9 @@ pub fn create_fuse_daemon( | |
daemon.service.mount(cmd)?; | ||
} | ||
daemon.service.session.lock().unwrap().mount()?; | ||
daemon | ||
.on_event(DaemonStateMachineInput::Mount) | ||
.map_err(|e| eother!(e))?; | ||
daemon | ||
.on_event(DaemonStateMachineInput::Start) | ||
.map_err(|e| eother!(e))?; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we relocate the API path to
/daemon/fuse/start
since the handler of endpoint/daemon/start
just starts fuse server's service loop. For nydusd itself, it is already started, otherwise how it can handle API requests?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
start
endpoint works with statusReady
andRUNNING
, which are designed for only fuse but also for other implements. Mayberun
is more comfortable?