Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement suggestion to prevent returning stopping workers #3703

Closed
5 tasks
beer-1 opened this issue Nov 19, 2023 · 0 comments · Fixed by #3725
Closed
5 tasks

improvement suggestion to prevent returning stopping workers #3703

beer-1 opened this issue Nov 19, 2023 · 0 comments · Fixed by #3725

Comments

@beer-1
Copy link
Contributor

beer-1 commented Nov 19, 2023

Proposal

#3370 introduces idle worker cleanup improvement, and the main idle worker is packet cmd worker. The packer worker is supposed to be stopped after 5 clear interval like below.

if idle_worker_timer > packet_cmd_worker_idle_timeout {
warn!("packet worker has been idle for more than {packet_cmd_worker_idle_timeout} blocks, aborting");
return Ok(Next::Abort);
}

The worker cleanup is happening 30s interval in supervisor level.

pub fn spawn_cleanup_worker(workers: Arc<RwLock<WorkerMap>>) -> TaskHandle {
spawn_background_task(
error_span!("cleanup_worker"),
Some(Duration::from_secs(30)),
move || -> Result<Next, TaskError<Infallible>> {
workers.acquire_write().clean_stopped_workers();
Ok(Next::Continue)
},
)
}

Problem

When a IBC event is triggered between idle timeout <> worker cleanup, then the function get_or_spawn will return stopping worker. And then the ibc event batch will be ignored due to the stopping packet cmd worker does not receive and process the event batch anymore. In this case, the user, who triggered the ignored IBC event, has to wait until other IBC events are triggered.

let worker = workers.get_or_spawn(object, src, dst, config);

pub fn get_or_spawn<Chain: ChainHandle>(
&mut self,
object: Object,
src: Chain,
dst: Chain,
config: &Config,
) -> &WorkerHandle {
if self.workers.contains_key(&object) {
&self.workers[&object]
} else {

Acceptance Criteria

I personally recommend to run cleanup before returning workers like below.

pub fn get_or_spawn<Chain: ChainHandle>(...) {

        // Clear workers, to prevent returning stopping workers.
        self.clean_stopped_workers();

...
}

Or, you can change packet cmd worker to halt only clearing operation without shutting down whole worker process like
this.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
1 participant