-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testnet 21 not exiting on SIGINT - API engine suspected #3759
Comments
Hi @Ben-PH, We have a lake of examples/documentation for The issue is Massa and Public API never stop, even if I call 2023-04-03T22:41:45.388469Z INFO massa_node: Node version : SAND.20.0
2023-04-03T22:41:45.400245Z INFO massa_network_worker: The node_id of this node is: N1DoT6WwDHSKPijrNYVrMDN2LR5xgGntoEPgJkfuM9ofAdwCJdH
2023-04-03T22:41:45.449219Z INFO massa_consensus_worker::worker::init: Started node at time 2023-04-03T22:41:45Z, cycle 0, period 0, thread 0
2023-04-03T22:41:45.449250Z INFO massa_consensus_worker::worker::init: 0 days, 0 hours, 0 minutes, 9 seconds remaining to genesis
2023-04-03T22:41:45.450848Z WARN massa_bootstrap::server::white_black_list: error on load whitelist/blacklist file : base_config/bootstrap_blacklist.json | No such file or directory (os error 2)
2023-04-03T22:41:45.452210Z INFO massa_node: API | EXPERIMENTAL JsonRPC | listening on: 0.0.0.0:33036
2023-04-03T22:41:45.453636Z INFO massa_node: API | gRPC | listening on: 0.0.0.0:33037
2023-04-03T22:41:45.453723Z INFO massa_node: API | PRIVATE JsonRPC | listening on: 127.0.0.1:33034
2023-04-03T22:41:45.453809Z INFO massa_node: API | PUBLIC JsonRPC | listening on: 0.0.0.0:33035
2023-04-03T22:41:45.453814Z WARN massa_node: deadlocks detector will run every 10s
2023-04-03T22:41:54.171413Z INFO massa_node: stop command received from private API
2023-04-03T22:41:54.171742Z INFO massa_bootstrap::server: ip white/blacklist updater exited cleanly
2023-04-03T22:41:54.173736Z INFO massa_node: Start stopping API's
2023-04-03T22:41:54.174004Z INFO massa_grpc::server: gRPC API finished cleanly
2023-04-03T22:41:54.174039Z INFO massa_api: API finished cleanly
2023-04-03T22:41:54.174844Z INFO massa_node: API | PRIVATE JsonRPC | stopped
2023-04-03T22:41:54.174878Z INFO massa_api: API finished cleanly
2023-04-03T22:41:55.403466Z INFO massa_consensus_worker::worker::main_loop: Massa network has started ! 🎉
2023-04-03T22:41:55.403595Z INFO massa_consensus_worker::worker::main_loop: Started cycle 0 It's same generic code for the PUBLIC, PRIVATE and EXPERIMENTAL, but we have only PRIVATE which is able to shutdown. I change dthe code to use The new gRPC works as expected as expected too. cc @modship, if you have an idea about this nightmare, feel free to help us 👍🏻 |
I also have problem when stopping my node, let's merge all the lasts PR and fix the stop of the node after. |
I have a strange behaviour, |
We did a deep dive into /// Start responding to connections requests.
///
/// This will run on the tokio runtime until the server is stopped or the `ServerHandle` is dropped.
pub fn start(mut self, methods: impl Into<Methods>) -> Result<ServerHandle, Error> {
let methods = methods.into().initialize_resources(&self.resources)?;
let (stop_tx, stop_rx) = watch::channel(());
let stop_handle = StopHandle::new(stop_rx);
match self.cfg.tokio_runtime.take() {
Some(rt) => rt.spawn(self.start_inner(methods, stop_handle)),
None => tokio::spawn(self.start_inner(methods, stop_handle)),
};
Ok(ServerHandle::new(stop_tx))
}
async fn start_inner(self, methods: Methods, stop_handle: StopHandle) {
let max_request_body_size = self.cfg.max_request_body_size;
let max_response_body_size = self.cfg.max_response_body_size;
let max_log_length = self.cfg.max_log_length;
let allow_hosts = self.cfg.allow_hosts;
let resources = self.resources;
let logger = self.logger;
let batch_requests_supported = self.cfg.batch_requests_supported;
let id_provider = self.id_provider;
let max_subscriptions_per_connection = self.cfg.max_subscriptions_per_connection;
let mut id: u32 = 0;
let connection_guard = ConnectionGuard::new(self.cfg.max_connections as usize);
let mut connections = FutureDriver::default();
let mut incoming = Monitored::new(Incoming(self.listener), &stop_handle);
loop {
match connections.select_with(&mut incoming).await {
Ok((socket, remote_addr)) => {
let data = ProcessConnection {
remote_addr,
methods: methods.clone(),
allow_hosts: allow_hosts.clone(),
resources: resources.clone(),
max_request_body_size,
max_response_body_size,
max_log_length,
batch_requests_supported,
id_provider: id_provider.clone(),
ping_interval: self.cfg.ping_interval,
stop_handle: stop_handle.clone(),
max_subscriptions_per_connection,
conn_id: id,
logger: logger.clone(),
max_connections: self.cfg.max_connections,
enable_http: self.cfg.enable_http,
enable_ws: self.cfg.enable_ws,
};
process_connection(&self.service_builder, &connection_guard, data, socket, &mut connections);
id = id.wrapping_add(1);
}
Err(MonitoredError::Selector(err)) => {
tracing::error!("Error while awaiting a new connection: {:?}", err);
}
Err(MonitoredError::Shutdown) => break,
}
}
connections.await;
}
} GG to @modship for his efforts on debugging this |
Nice finding. So we make a stop command in the consensus as a workaround for now ? Until we provide a fix to jsonrpsee ? |
is this happening in the bootstrap server? |
it's jsonrpsee lib here so maybe "identical" but not correlated. |
this works with same version as massa (tokio=1.23, jsonrpsee=0.16.2) fn main() {
tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.unwrap()
.block_on(async {
let server = ServerBuilder::default()
.http_only()
.ping_interval(Duration::from_secs(1))
.build("0.0.0.0:8080".parse::<SocketAddr>().unwrap())
.await
.unwrap();
let mut module = RpcModule::new(());
module
.register_method("say_hello", |_, _| Ok("lo"))
.unwrap();
println!("server start");
let handle = server.start(module).unwrap();
assert!(!handle.is_stopped());
let res = handle.stop();
dbg!(res);
handle.stopped().await;
println!("server stop");
});
} in massa this works : let api_public_handle = api_public
.serve(&SETTINGS.api.bind_public, &api_config)
.await
.expect("failed to start PUBLIC API");
info!(
"API | PUBLIC JsonRPC | listening on: {}",
api_config.bind_public
);
api_public_handle.stop().await;
dbg!("stopped"); with stop() = pub async fn stop(self) {
match self.server_handler.stop() {
Ok(_) => {
info!("API finished cleanly");
}
Err(err) => warn!("API thread panicked: {:?}", err),
}
self.server_handler.stopped().await;
} |
Describe the bug
On testnet 21, SIGINT (ctrl-c) handling causes a deadlock/blockage while stopping the various processes
To Reproduce
deploy labnet on testnet 21 (haven't checked with sandbox)
run a node that bootstraps off a labnet server
once the client is up and running, provide a SIGINT (ctrl-c works well)
observe that "Stopping consensus worker..." is logged, with no follow-up
Expected behavior
The consensus worker exits correctly, followed by clean exit out of the program shortly after
Additional context
with these implementations on
ConsensusControllerImpl
inmassa-consensus-worker/src/controller.rs
as well as changes such as these in
massa-node/src/main.rs
a cleaned up version of the logs will look like this:
Note: there are 5 clones of the consensus-controller, so we would expect 6 drops (one for each clone, plus the original),
BUT
I count only 4 drops. It looks like the instances that are provided to the public api and massa api are not dropped.
This is important, because the consensus manager stops after
join
ing on the thread that's spawned to host theConsensusWorker::run()
execution. The content of which exits its main-loop by way of detecting that the channel it receives on is disconnected.The disconnection will only occur once all the consensus controllers that are cloned from that same base-controller are dropped, as each clone will also clone the sender side of that channel.
The text was updated successfully, but these errors were encountered: