-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid cheaped workers status (non-zero pid for cheaped worker) #122
Comments
I can add checks in those loops to look for |
Stats from another app on the same server:
Full stats output: https://gist.github.com/dcd46a04dd9b4f0d32bc |
It turns out that all those workers are running but they are |
Also it seems that many (if not all) apps on all of my boxes are affected, this looks weird, maybe I had some network or other global issue that triggered this error? Maybe something bad happened when those processes were stopped/reloaded due max requests and uWSGI did not noted that there was an error? |
It looks like when uWSGI wants to cheap a worker it will mark it as "cheaped = 1" and then just send SIGWINCH signal. In lengthy master_loop uWSGI will check if any worker has died and mark it's pid as 0, so this will normally set pid=0 for cheaped workers once they die. But if worker for any reason does not die, it will continue to run but uWSGI will have it mared as cheaped while keeping pid > 0. uWSGI will only check if workers are dead while whole uWSGI instance is stopping or reloading, we have reload mercy there, maybe we should also add checks in master_loop() to verify that all workers who have cheaped=1 have also pid=0? Think of it as reload mercy but for individual workers. |
I wrote a patch for it, to verify if this will work properly. I've installed patch uWSGI on one server, I will monitor it to check if it kills those defunct cheaped workers. |
It seems to be working just fine:
I should only get rid of this Strace from such worker:
It looks like some thread is keeping it in defunct state, probably offload one. EDIT: yes, it didn't take long, my workers are not shutting down correctly due to enabled offload threads. This seems to be happening every time. |
BTW I didn't noticed this issue before so it is possible that it was introduced somewhere along 1.4 bugfix releasses, probably between 1.4.2 and 1.4.4. I've upgraded from 1.4.2 to 1.4.4 week ago and I think this was happening all week, it's just I didn't noticed it until yesterday. |
I've reverted the only change that was made to offload.c past 1.4.2 but it still hangs on worker stop if offload threads are enabled. |
the uwsgi_ignition() function in core/uwsgi.c exit with pthread_exit(NULL) |
replacing It's easy to test it, just run:
If You have defunct worker after:
and there is no |
ok applied it to 1.4 tree (using end_me(0)) |
It works fine, I've retested with latest master and the issue is gone. |
I've noticed a case when cheaper busyness plugin was running only few workers, much less then the limit I've set with
--workers
, despite the load and a lot of queued requests.After restarting my app everything started to work fine, I investigated logs and I'm quite sure that this is a bug with counting cheaped workers.
Number of cheaped workers is always counted in uWSGI (not only in cheaper_busyness plugin) using this loop:
I'm 100% sure that "cheap" status that is taken from
uwsgi.workers[i].cheaped
was right when I had this issue, so it must be theuwsgi.workers[i].pid
that was non-zero for cheaped worker. Is there any reason why we should check for both in those loops?non-zero pid for cheaped worker issue will probably take some digging to track it down, maybe we should not care for the pid and trust in worker status value alone?
The text was updated successfully, but these errors were encountered: