Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate harakiri counting #33

Merged
merged 2 commits into from
Oct 26, 2012
Merged

More accurate harakiri counting #33

merged 2 commits into from
Oct 26, 2012

Conversation

prymitive
Copy link
Contributor

I've noticed big harakiri spikes in my carbon stats and after checking logs it turns out it's just one worker pid getting many SIGKILLs:

Oct 25 08:33:55 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:33:55 localhost app: HARAKIRI: -- wchan> wait_answer_interruptible
Oct 25 08:33:57 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:33:57 localhost app: HARAKIRI: -- wchan> request_wait_answer
Oct 25 08:33:59 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:33:59 localhost app: HARAKIRI: -- wchan> request_wait_answer
Oct 25 08:34:01 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:34:01 localhost app: HARAKIRI: -- wchan> request_wait_answer
Oct 25 08:34:03 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:34:03 localhost app: HARAKIRI: -- wchan> request_wait_answer
Oct 25 08:34:05 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:34:05 localhost app: HARAKIRI: -- wchan> request_wait_answer
Oct 25 08:34:07 localhost app: *** HARAKIRI ON WORKER 4 (pid: 10469) ***
Oct 25 08:34:07 localhost app: HARAKIRI: -- wchan> request_wait_answer

Problem is that if process hangs and harakiri tries to kill it, it doesn't need to die instantly, AFAIK it can be running for a while if it waits for I/O. So I've added worker[nr].pending_harakiri, if this has non-zero value than worker is being killed. worker[nr].harakiri_count is only incremented if pending_harakiri == 0

Since worker[nr].pending_harakiri stores number of SIGKILLs Number of SIGKILLs worker recived it is also logged for debbuging:

*** HARAKIRI ON WORKER 1 (pid: 99, try: 2) ***

@unbit
Copy link
Owner

unbit commented Oct 26, 2012

i think it is good, i did not take in account that SIGKILL could take ages if there is a blocking condition at the kernel level.

unbit added a commit that referenced this pull request Oct 26, 2012
More accurate harakiri counting
@unbit unbit merged commit 7967e8a into unbit:master Oct 26, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants