Skip to content

Conversation

@xiaguan
Copy link
Collaborator

@xiaguan xiaguan commented Sep 15, 2025

Previous signal handler could potentially deadlock.

This PR uses a thread to process the signal, avoiding this issue.

@xiaguan
Copy link
Collaborator Author

xiaguan commented Sep 15, 2025

Problem

Our current signal handling logic directly calls cleanupAllResources() inside the signal handler.

This causes a deadlock risk:

[Thread T2 running normally]
          |
          v
 ┌─────────────────────────────┐
 │ Executing: lock(mutex_) ... │
 │             ...             │
 └─────────────────────────────┘
          ^
          |
(SIGINT arrives, kernel interrupts T2)
          |
          v
 ┌──────────────────────────────┐
 │ Run signalHandler(SIGINT)    │
 │   -> calls cleanupAllResources│
 │   -> tries to lock(mutex_)   │
 └──────────────────────────────┘

If the signal interrupts a thread that already holds the mutex, the handler will try to lock the same mutex again → self-deadlock.

@xiaguan
Copy link
Collaborator Author

xiaguan commented Sep 15, 2025

About the lock and raw ptr, let me think if there's a more elegant solution.

But the basic idea is still to let one thread specifically handle signals. I'll get back to you shortly for review.

@xiaguan xiaguan marked this pull request as draft September 15, 2025 12:40
@xiaguan xiaguan marked this pull request as ready for review September 16, 2025 03:48
@xiaguan
Copy link
Collaborator Author

xiaguan commented Sep 16, 2025

@ykwd You could review this when you're free.Tks

Copy link
Collaborator

@ykwd ykwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thorough offline discussions, we have concluded that the current approach is acceptable.

sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(sig, &sa, nullptr);
raise(sig);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it finishes cleanup, it'll re-raise the signal to terminate the whole process.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Thx for clarifying

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a really small gap between them. But I think heartbeat may solve this.

@xiaguan xiaguan merged commit 785e939 into kvcache-ai:main Sep 16, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants