timer decorator and cheaped workers #58

prymitive · 2012-11-22T21:35:26Z

If I use timer decorator and all my workers will be cheaped than also timer will stop working since it is executed in first worker. Docs say that timer can be executed using spooler instead of worker, but it doesn't say anything about cheaping workers with --idle option.
This should be either:

documented - so that users will be warned (provided that they will read docs) and maybe print big fat warning that using --idle with timer signal (and possibly other) does not mix
disable --idle when signals are used - since user wants to handle some signals and we need worker to do so it's better to disable one minor feature than to break other that is potentially important
fix it in some other way

timerapp.ini:

[uwsgi]
autoload = true
plugins-dir = /usr/lib/uwsgi/plugins
plugins = python
socket = :0
master = true
uid = nobody
gid = nogroup
processes = 1
wsgi-file = timerapp.py
idle = 5

timerapp.py:

import uwsgi
from uwsgidecorators import rbtimer


@rbtimer(1)
def timer(sig):
    print('Ping!')


def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return 'uWSGI is alive!'

app log:

[uWSGI] getting INI configuration from timerapp.ini
*** Starting uWSGI 1.4.1 (64bit) on [Thu Nov 22 22:23:06 2012] ***
compiled with version: 4.4.3 on 13 November 2012 09:26:57
os: Linux-3.0.0-27-virtual #44~lucid1-Ubuntu SMP Fri Oct 19 16:05:20 UTC 2012
nodename: sudoku-dev-backend1
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /home/app
detected binary path: /usr/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
uwsgi socket 0 bound to TCP address :26411 (port auto-assigned) fd 3
Python version: 2.6.5 (r265:79063, Oct  1 2012, 22:16:31)  [GCC 4.4.3]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x1b6f5d0
your server socket listen backlog is limited to 100 connections
mapped 144784 bytes (141 KB) for 1 cores
*** Operational MODE: single process ***
[uwsgi-signal] signum 0 registered (wid: 0 modifier1: 0 target: default, any worker)
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1b6f5d0 pid: 9480 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 9480)
spawned uWSGI worker 1 (pid: 9481, cores: 1)
Ping!
Ping!
Ping!
Ping!
Ping!
Ping!
workers have been inactive for more than 5 seconds (1353619393-1353619387)
cheap mode enabled: waiting for socket connection...

The text was updated successfully, but these errors were encountered:

prymitive · 2012-11-22T21:37:16Z

After a while logs are filled with

*** SIGNAL QUEUE IS FULL: buffer size 229376 bytes (you can tune it with --signal-bufsize) ***
could not deliver signal 0 to workers pool
*** SIGNAL QUEUE IS FULL: buffer size 229376 bytes (you can tune it with --signal-bufsize) ***

This is probably since we are sending signals all the time (probably master does that), but there is no worker to handle them.

unbit · 2012-11-25T10:44:25Z

The quick solution would be using a mule for that. The other one would be waking up a worker when a signal has to be routed. But it is that kind of behaviour that users have to choose, so having an "invasive" default is not a good idea.

Another approach would be accounting handled signals in addition to requests, so the cheap/cheaper/idle modes would be more smart about that.

prymitive · 2012-11-26T23:26:05Z

There is another issue I've spotted, signals and cheaper don't mix to well:

[uwsgi-signal] you have registered this signal in worker 2 memory area, only that process will be able to run it
Tue Nov 27 00:18:57 2012 - error managing signal 1 on worker 5

uWSGI starts and the first worker registers my timer
I load the box and cheaper spawns new workers
after a while load is gone and cheaper stops few workers, it will pick pretty much any random worker so it might be worker nr 1 which leads to error above

I think that with dynamic worker process management there will always be issues like this, so IMHO it should be better to just spawn mule for signal handling when we have none spawned and user registers new timer. Moving signal handling to dedicated process will free us from the need to handle growing number of corner cases while uWSGI will be picking new tricks for worker process management.

prymitive · 2012-11-27T08:46:06Z

I left my vassal running and after few hours I get constant segfaults:

!!! uWSGI process 4572 got Segmentation Fault !!!
*** backtrace of 4572 ***
[bubbles-dev] uWSGI worker 1(uwsgi_backtrace+0x25) [0x43dbd5]
[bubbles-dev] uWSGI worker 1(uwsgi_segfault+0x21) [0x43dcb1]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7faf03a3d4a0]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x1b) [0x7faf0341fe4b]
/usr/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47) [0x7faf034207d7]
/home/lukasz.mierzwa/uwsgi/build/python_plugin.so(python_call+0x24) [0x7faf037ec974]
/home/lukasz.mierzwa/uwsgi/build/python_plugin.so(uwsgi_python_signal_handler+0x6f) [0x7faf037e9d8f]
[bubbles-dev] uWSGI worker 1(uwsgi_signal_handler+0xff) [0x4372bf]
[bubbles-dev] uWSGI worker 1(uwsgi_receive_signal+0x34) [0x438804]
[bubbles-dev] uWSGI worker 1(wsgi_req_accept+0x1f7) [0x413b77]
[bubbles-dev] uWSGI worker 1(simple_loop_run+0xb6) [0x43a5d6]
[bubbles-dev] uWSGI worker 1(uwsgi_ignition+0x18a) [0x43e11a]
[bubbles-dev] uWSGI worker 1(uwsgi_start+0x12b3) [0x43fc53]
[bubbles-dev] uWSGI worker 1() [0x412d01]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7faf03a2876d]
[bubbles-dev] uWSGI worker 1() [0x412d2d]

unbit · 2012-11-27T09:33:42Z

do you have --enable-threads ? that kind of error happens when the gil is not get before calling python functions

prymitive · 2012-11-27T09:35:36Z

No I don't:

[uwsgi]

appdir = /home/lukasz.mierzwa/bubbles

plugins-dir = /home/lukasz.mierzwa/uwsgi/build
plugins = 0:python
http-socket = :1080
chdir = %(appdir)
processes = 8
cheaper = 1
no-orphans = true

pythonpath = %(appdir)

env = DJANGO_SETTINGS_MODULE=bubbles.settings.devel
env = BUBBLES_MULE_CONFIG=%(appdir)/config/mule.yml

module = django.core.handlers.wsgi:WSGIHandler()
#wsgi-file = %(appdir)/bubbles/wsgi.py

static-map = /static/=%(appdir)/static/
static-map = /favicon.ico=%(appdir)/static/favicon.ico
static-map = /robots.txt=%(appdir)/static/robots.txt

uid = lukasz.mierzwa
gid = users

auto-procname = true
procname-prefix-spaced = [bubbles-dev]

touch-reload = /tmp/bubbles-dev.txt

if-not-exists = /tmp/bubbles-dev.txt
exec-as-root = touch /tmp/bubbles-dev.txt
endif =

unbit · 2012-11-27T09:38:39Z

i do not see spawned mules, are not you are testing them ?

prymitive · 2012-11-27T09:42:10Z

No, this is my development vm, I have there one app I'm working on, I've added uWSGI timer to my django settings so that uWSGI will restart after code is modified. After adding --processes 8 and --cheaper 1 I've spotted this issue.

unbit · 2012-11-27T09:51:56Z

OT use py-auto-reload = n (n is in seconds) to accomplish that

unbit · 2012-11-27T09:53:18Z

I am starting to think not registering signal handlers in the master should not be allowed while in cheap/cheaper modes

prymitive · 2012-11-29T16:09:30Z

I've got another segfult, looking at logs I see this pattern:

second worker is spawned

spawned uWSGI worker 2 (pid: 11091, cores: 1)

[uwsgi-signal] signum 1 registered (wid: 2 modifier1: 0 target: default, any worker)

I'm starting to get

[uwsgi-signal] you have registered this signal in worker 1 memory area, only that process will be able to run it

[uwsgi-signal] you have registered this signal in worker 2 memory area, only that process will be able to run it

first worker is cheaped
```
uWSGI worker 1 cheaped.
```
The I start ab run so app gets loaded and first worker is respawned
```
Respawned uWSGI worker 1 (new pid: 11100)
```

Segfault follow after a while

!!! uWSGI process 11100 got Segmentation Fault !!!
*** backtrace of 11100 ***
[bubbles-dev] uWSGI worker 1(uwsgi_backtrace+0x25) [0x43dd65]
[bubbles-dev] uWSGI worker 1(uwsgi_segfault+0x21) [0x43de41]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f25205cf4a0]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x1f) [0x7f251ffb1e4f]
/usr/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47) [0x7f251ffb27d7]
/home/lukasz.mierzwa/uwsgi/build/python_plugin.so(python_call+0x24) [0x7f252037e974]
/home/lukasz.mierzwa/uwsgi/build/python_plugin.so(uwsgi_python_signal_handler+0x6f) [0x7f252037bd8f]
[bubbles-dev] uWSGI worker 1(uwsgi_signal_handler+0xff) [0x43732f]
[bubbles-dev] uWSGI worker 1(uwsgi_receive_signal+0x34) [0x438874]
[bubbles-dev] uWSGI worker 1(wsgi_req_accept+0x1f7) [0x413b77]
[bubbles-dev] uWSGI worker 1(simple_loop_run+0xb6) [0x43a646]
[bubbles-dev] uWSGI worker 1(uwsgi_ignition+0x18a) [0x43e2aa]
[bubbles-dev] uWSGI worker 1(uwsgi_start+0x12b3) [0x43fde3]
[bubbles-dev] uWSGI worker 1() [0x412d01]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f25205ba76d]
[bubbles-dev] uWSGI worker 1() [0x412d2d]
*** end of backtrace ***

After that it's always the first worker that gets segfault.
So far I wasn't able to reproduce it.

unbit · 2013-02-12T18:27:06Z

So, the new master is almost ready. Time to make a decision. How to deal with worker signal handlers when in cheap mode (or the corresponding worker is cheaped) ? Just spawn the worker ? drop the signal ? Other solutions ?

prymitive · 2013-02-12T18:42:33Z

I would spawn new worker, if user want to handle such signal that that should take precedence over cheaper mode

prymitive · 2014-05-05T16:55:18Z

What about this one?

jsivak · 2014-08-31T00:42:25Z

Is this issue still being addressed or investigated? My current solution is to:

Not use --idle
always require one worker to be running in order for timer events to be handled correctly

But it would be nicer if the uWSGI master process would spawn a new worker when a timer event fires.

unbit · 2014-08-31T05:36:56Z

We have 3 events to manage:

spawn a specific worker when the signal uses the target 'workerN'
spawn all of the workers when target 'workers' is used
spawn at least a worker when 'worker' target is used.

The only true problem is deciding if this is the default behaviour or not

jsivak · 2014-09-01T14:54:52Z

Not sure if you are asking for my opinion/observation or just stating you're current thought.:

If an option helps, then: The "spawn at least a worker when 'worker' target is used." option seems the most correct/expected. If I set a timer in my code I would expect (when the timer event fires) that if no workers were present then one would be created to service the timer event.

Otherwise, if you're still thinking about this issue at an software architectural level, then I understand.

Thanks

jsivak · 2014-11-02T19:13:52Z

Another idea on this topic:

The @Timer() decorator supports the "target='spooler'" argument, if the @Timer() decorator also supported the "spooler='/path/to/spooler'" argument (which I don't think it does) then I could dedicate a spooler process to handling timer only requests.

Not sure if this is easier than having timer events spawning new workers.

The other reason this method would be useful to me is that I could force all of the logging from the dedicated "timer" spooler to a separate file.

unbit · 2014-11-03T05:52:13Z

This is something to do absolutely (that part was never aligned with the multiple spooler codebase introduced in 1.4). In the mean time you could use a mule (the target syntax take the mule0, mule1 syntax to send requests to a specific mule)

aldem · 2014-11-16T13:07:23Z

To me, it seems logical (= I have expected) that signals are delivered only when worker is active, thus - no process - no delivery attempt.
But since there are cases when signal could be (or should be) used to wakeup a worker, this intention has to be explicitly specified when (or before) registering a handler, and/or (optionally) - when sending a signal.
The default could be made configurable, probably based on worker type (mule/spooler/etc).

unbit · 2014-11-16T16:38:55Z

There are really dozens of scenarios here, this is why we are still searching for a solution.

Some note:

on-demand respawning could be bad (think about rails apps)
blocking signals for a cheaped target could be bad (you cannot know when cheap-status happens, and you may have time-related signals like timers and crons)

The current solution on top of the list is having a fallback target (something like --signal-fallback mule)
that will route undeliverable signals to a new target (like a spooler or a mule).

This is easy to implement (and probably this is the reason why it is on top of the list ;)

Other solutions require heavy-lifting of the core (like adding a policy field to the signal table that describe how an unroutable signal should behave). Something to think for 2.1

aldem · 2014-11-16T17:10:02Z

What about simple option which (when set) will limit sending signals only to active workers - does it also requires heavy-lifting?
I doubt it will make things worse (unless used improperly) but could cover many cases when current behavior is undesirable, and it is easier to implement (I hope) than fallback target.

unbit · 2014-11-16T17:12:22Z

If you use "worker" as target the signal will be routed to the first available (alive) worker so it should be more than enough for you. The problem (if i understand correctly) is when you have all workers dead.

aldem · 2014-11-16T17:20:54Z

No, the problem arises when some workers are dead (like described in #775 - one is active, 4 are not), and when I need some periodical wake-ups in all active workers.

unbit · 2014-11-16T17:22:14Z

So is the "workers" (with the final 's') target, right ? You need to trigger the signal to all of the workers but it fails as some of them are dead ?

aldem · 2014-11-16T17:27:18Z

Exactly the problem - target is "workers", alive are signalled but dead are not and signals are accumulated in the queue, eventually overflowing it.
Something like "active-workers" as a target would do the trick (probably).

unbit · 2014-11-16T17:27:58Z

Yes, it could be an interesting approach (i mean the "active-workers" target). I will back to you soon

aldem · 2014-11-16T17:37:42Z

Additional note - not sure if it belongs here, but definitely related - allowing specifying target in signal sources (signal/timer/cron/etc) would add a lot of flexibility.
I am not familiar with the core, but it does not look too complex to implement, though I may be wrong :)

unbit · 2014-11-16T17:50:58Z

I have pushed (in both 2.0 and 2.1) the 'active-workers' target. Let me know

aldem · 2014-11-16T18:39:09Z

Seems to work with signal delivery, but produces segfault when attempting to kill('HUP', $$) from signal handler (this did work properly in 2.0.8).

To avoid clutter here all details are in gist: https://gist.github.com/aldem/a19eef9cd58690534a5c

unbit · 2014-11-16T19:32:12Z

This is exactly the kind of problems i was referring in the mailing list thread. The segfault is caused by the new atexit hook in the perl plugin :) I'll try to understand if we can have a workaround

unbit · 2014-11-16T20:01:54Z

Ok, now it should work reliably. The atexit procedure for perl checks if there is already a running function (basically it checks for a running request or a running signal handler), and if so it skips normal interpreter destruction.

aldem · 2014-11-16T20:55:47Z

Yes, it works - thanks! Though I don't see how atexit() was involved here - it wasn't used at all in my test case...

unbit · 2014-11-17T04:44:20Z

The new plugin version correctly (tries) to call perl_destruct and perl_free. Those functions if called in a signal handler while another subroutine is running, tend to explodes.

aldem · 2014-11-17T10:31:55Z

What about marking worker as "needs cleanup" if it was busy (in signal/request handler) and cleaning up again (perl_dectruct/perl_free) once all processing is completed?

jsivak · 2014-11-19T03:32:17Z

I'm confused on the progress here; I'm looking for workers to be automatically spawned when a timer event/decorator fires and no workers are active (using the "idle") parameter in the config.

Based discussion in this thread for the past few days, it sounds like automatically spawning workers to handle timer events "is bad".

unbit · 2014-11-19T03:49:22Z

@jsivak it depends on the context. if your app is fully preforked, spawning it on-demand will be a matter of few milliseconds. we are trying to cover different use cases

jsivak · 2014-11-19T14:50:51Z

That type of delay is fine for my needs; I'm running my web app's with "lazy-apps=true" set; not exactly sure if that's what you mean when you say "fully preforked".

If the "auto spawning" is in the uwsgi-2.0 branch on github, then let me know how I can test it (what config file settings to set) and I'll give it a try.

jsivak · 2016-12-29T17:09:36Z

Just got bit with this issue on one of my older apps (running uwsgi 2.0.8).. Did any resolution for this issue get decided on or implemented in a newer version?

Thanks

unbit mentioned this issue Nov 16, 2014

Signals are delivered to non-existing workers (cheaper mode in particular) [uwsgi 2.0.8] #775

Open

aldem mentioned this issue Nov 16, 2014

atexit hook is not called in cheaped worker due to brutal killing (2.0.8) #776

Closed

unbit added a commit that referenced this issue Nov 16, 2014

added active-workers signal target, aimed at improving #58

a86a1fc

unbit added a commit that referenced this issue Nov 16, 2014

added active-workers signal target, aimed at improving #58

1470297

yuzz920903 mentioned this issue Mar 30, 2023

I have got a coredump with uwsgi, version is 2.0.19.1. #2526

Open

timer decorator and cheaped workers #58

timer decorator and cheaped workers #58

Comments

prymitive commented Nov 22, 2012

prymitive commented Nov 22, 2012

unbit commented Nov 25, 2012

prymitive commented Nov 26, 2012

prymitive commented Nov 27, 2012

unbit commented Nov 27, 2012

prymitive commented Nov 27, 2012

unbit commented Nov 27, 2012

prymitive commented Nov 27, 2012

unbit commented Nov 27, 2012

unbit commented Nov 27, 2012

prymitive commented Nov 29, 2012

unbit commented Feb 12, 2013

prymitive commented Feb 12, 2013

prymitive commented May 5, 2014

jsivak commented Aug 31, 2014

unbit commented Aug 31, 2014

jsivak commented Sep 1, 2014

jsivak commented Nov 2, 2014

unbit commented Nov 3, 2014

aldem commented Nov 16, 2014

unbit commented Nov 16, 2014

aldem commented Nov 16, 2014

unbit commented Nov 16, 2014

aldem commented Nov 16, 2014

unbit commented Nov 16, 2014

aldem commented Nov 16, 2014

unbit commented Nov 16, 2014

aldem commented Nov 16, 2014

unbit commented Nov 16, 2014

aldem commented Nov 16, 2014

unbit commented Nov 16, 2014

unbit commented Nov 16, 2014

aldem commented Nov 16, 2014

unbit commented Nov 17, 2014

aldem commented Nov 17, 2014

jsivak commented Nov 19, 2014

unbit commented Nov 19, 2014

jsivak commented Nov 19, 2014

jsivak commented Dec 29, 2016