Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timer decorator and cheaped workers #58

Open
prymitive opened this issue Nov 22, 2012 · 39 comments
Open

timer decorator and cheaped workers #58

prymitive opened this issue Nov 22, 2012 · 39 comments

Comments

@prymitive
Copy link
Contributor

If I use timer decorator and all my workers will be cheaped than also timer will stop working since it is executed in first worker. Docs say that timer can be executed using spooler instead of worker, but it doesn't say anything about cheaping workers with --idle option.
This should be either:

  1. documented - so that users will be warned (provided that they will read docs) and maybe print big fat warning that using --idle with timer signal (and possibly other) does not mix
  2. disable --idle when signals are used - since user wants to handle some signals and we need worker to do so it's better to disable one minor feature than to break other that is potentially important
  3. fix it in some other way

timerapp.ini:

[uwsgi]
autoload = true
plugins-dir = /usr/lib/uwsgi/plugins
plugins = python
socket = :0
master = true
uid = nobody
gid = nogroup
processes = 1
wsgi-file = timerapp.py
idle = 5

timerapp.py:

import uwsgi
from uwsgidecorators import rbtimer


@rbtimer(1)
def timer(sig):
    print('Ping!')


def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return 'uWSGI is alive!'

app log:

[uWSGI] getting INI configuration from timerapp.ini
*** Starting uWSGI 1.4.1 (64bit) on [Thu Nov 22 22:23:06 2012] ***
compiled with version: 4.4.3 on 13 November 2012 09:26:57
os: Linux-3.0.0-27-virtual #44~lucid1-Ubuntu SMP Fri Oct 19 16:05:20 UTC 2012
nodename: sudoku-dev-backend1
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /home/app
detected binary path: /usr/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
uwsgi socket 0 bound to TCP address :26411 (port auto-assigned) fd 3
Python version: 2.6.5 (r265:79063, Oct  1 2012, 22:16:31)  [GCC 4.4.3]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x1b6f5d0
your server socket listen backlog is limited to 100 connections
mapped 144784 bytes (141 KB) for 1 cores
*** Operational MODE: single process ***
[uwsgi-signal] signum 0 registered (wid: 0 modifier1: 0 target: default, any worker)
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1b6f5d0 pid: 9480 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 9480)
spawned uWSGI worker 1 (pid: 9481, cores: 1)
Ping!
Ping!
Ping!
Ping!
Ping!
Ping!
workers have been inactive for more than 5 seconds (1353619393-1353619387)
cheap mode enabled: waiting for socket connection...
@prymitive
Copy link
Contributor Author

After a while logs are filled with

*** SIGNAL QUEUE IS FULL: buffer size 229376 bytes (you can tune it with --signal-bufsize) ***
could not deliver signal 0 to workers pool
*** SIGNAL QUEUE IS FULL: buffer size 229376 bytes (you can tune it with --signal-bufsize) ***

This is probably since we are sending signals all the time (probably master does that), but there is no worker to handle them.

@unbit
Copy link
Owner

unbit commented Nov 25, 2012

The quick solution would be using a mule for that. The other one would be waking up a worker when a signal has to be routed. But it is that kind of behaviour that users have to choose, so having an "invasive" default is not a good idea.

Another approach would be accounting handled signals in addition to requests, so the cheap/cheaper/idle modes would be more smart about that.

@prymitive
Copy link
Contributor Author

There is another issue I've spotted, signals and cheaper don't mix to well:

[uwsgi-signal] you have registered this signal in worker 2 memory area, only that process will be able to run it
Tue Nov 27 00:18:57 2012 - error managing signal 1 on worker 5
  1. uWSGI starts and the first worker registers my timer
  2. I load the box and cheaper spawns new workers
  3. after a while load is gone and cheaper stops few workers, it will pick pretty much any random worker so it might be worker nr 1 which leads to error above

I think that with dynamic worker process management there will always be issues like this, so IMHO it should be better to just spawn mule for signal handling when we have none spawned and user registers new timer. Moving signal handling to dedicated process will free us from the need to handle growing number of corner cases while uWSGI will be picking new tricks for worker process management.

@prymitive
Copy link
Contributor Author

I left my vassal running and after few hours I get constant segfaults:

!!! uWSGI process 4572 got Segmentation Fault !!!
*** backtrace of 4572 ***
[bubbles-dev] uWSGI worker 1(uwsgi_backtrace+0x25) [0x43dbd5]
[bubbles-dev] uWSGI worker 1(uwsgi_segfault+0x21) [0x43dcb1]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7faf03a3d4a0]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x1b) [0x7faf0341fe4b]
/usr/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47) [0x7faf034207d7]
/home/lukasz.mierzwa/uwsgi/build/python_plugin.so(python_call+0x24) [0x7faf037ec974]
/home/lukasz.mierzwa/uwsgi/build/python_plugin.so(uwsgi_python_signal_handler+0x6f) [0x7faf037e9d8f]
[bubbles-dev] uWSGI worker 1(uwsgi_signal_handler+0xff) [0x4372bf]
[bubbles-dev] uWSGI worker 1(uwsgi_receive_signal+0x34) [0x438804]
[bubbles-dev] uWSGI worker 1(wsgi_req_accept+0x1f7) [0x413b77]
[bubbles-dev] uWSGI worker 1(simple_loop_run+0xb6) [0x43a5d6]
[bubbles-dev] uWSGI worker 1(uwsgi_ignition+0x18a) [0x43e11a]
[bubbles-dev] uWSGI worker 1(uwsgi_start+0x12b3) [0x43fc53]
[bubbles-dev] uWSGI worker 1() [0x412d01]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7faf03a2876d]
[bubbles-dev] uWSGI worker 1() [0x412d2d]

@unbit
Copy link
Owner

unbit commented Nov 27, 2012

do you have --enable-threads ? that kind of error happens when the gil is not get before calling python functions

@prymitive
Copy link
Contributor Author

No I don't:

[uwsgi]

appdir = /home/lukasz.mierzwa/bubbles

plugins-dir = /home/lukasz.mierzwa/uwsgi/build
plugins = 0:python
http-socket = :1080
chdir = %(appdir)
processes = 8
cheaper = 1
no-orphans = true

pythonpath = %(appdir)

env = DJANGO_SETTINGS_MODULE=bubbles.settings.devel
env = BUBBLES_MULE_CONFIG=%(appdir)/config/mule.yml

module = django.core.handlers.wsgi:WSGIHandler()
#wsgi-file = %(appdir)/bubbles/wsgi.py

static-map = /static/=%(appdir)/static/
static-map = /favicon.ico=%(appdir)/static/favicon.ico
static-map = /robots.txt=%(appdir)/static/robots.txt

uid = lukasz.mierzwa
gid = users

auto-procname = true
procname-prefix-spaced = [bubbles-dev]

touch-reload = /tmp/bubbles-dev.txt

if-not-exists = /tmp/bubbles-dev.txt
exec-as-root = touch /tmp/bubbles-dev.txt
endif =

@unbit
Copy link
Owner

unbit commented Nov 27, 2012

i do not see spawned mules, are not you are testing them ?

@prymitive
Copy link
Contributor Author

No, this is my development vm, I have there one app I'm working on, I've added uWSGI timer to my django settings so that uWSGI will restart after code is modified. After adding --processes 8 and --cheaper 1 I've spotted this issue.

@unbit
Copy link
Owner

unbit commented Nov 27, 2012

OT use py-auto-reload = n (n is in seconds) to accomplish that

@unbit
Copy link
Owner

unbit commented Nov 27, 2012

I am starting to think not registering signal handlers in the master should not be allowed while in cheap/cheaper modes

@prymitive
Copy link
Contributor Author

I've got another segfult, looking at logs I see this pattern:

  • second worker is spawned

    spawned uWSGI worker 2 (pid: 11091, cores: 1)
    
    [uwsgi-signal] signum 1 registered (wid: 2 modifier1: 0 target: default, any worker)
    
  • I'm starting to get

    [uwsgi-signal] you have registered this signal in worker 1 memory area, only that process will be able to run it
    
    [uwsgi-signal] you have registered this signal in worker 2 memory area, only that process will be able to run it
    
  • first worker is cheaped

    uWSGI worker 1 cheaped.
    
  • The I start ab run so app gets loaded and first worker is respawned

    Respawned uWSGI worker 1 (new pid: 11100)
    
  • Segfault follow after a while

    !!! uWSGI process 11100 got Segmentation Fault !!!
    *** backtrace of 11100 ***
    [bubbles-dev] uWSGI worker 1(uwsgi_backtrace+0x25) [0x43dd65]
    [bubbles-dev] uWSGI worker 1(uwsgi_segfault+0x21) [0x43de41]
    /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f25205cf4a0]
    /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x1f) [0x7f251ffb1e4f]
    /usr/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47) [0x7f251ffb27d7]
    /home/lukasz.mierzwa/uwsgi/build/python_plugin.so(python_call+0x24) [0x7f252037e974]
    /home/lukasz.mierzwa/uwsgi/build/python_plugin.so(uwsgi_python_signal_handler+0x6f) [0x7f252037bd8f]
    [bubbles-dev] uWSGI worker 1(uwsgi_signal_handler+0xff) [0x43732f]
    [bubbles-dev] uWSGI worker 1(uwsgi_receive_signal+0x34) [0x438874]
    [bubbles-dev] uWSGI worker 1(wsgi_req_accept+0x1f7) [0x413b77]
    [bubbles-dev] uWSGI worker 1(simple_loop_run+0xb6) [0x43a646]
    [bubbles-dev] uWSGI worker 1(uwsgi_ignition+0x18a) [0x43e2aa]
    [bubbles-dev] uWSGI worker 1(uwsgi_start+0x12b3) [0x43fde3]
    [bubbles-dev] uWSGI worker 1() [0x412d01]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f25205ba76d]
    [bubbles-dev] uWSGI worker 1() [0x412d2d]
    *** end of backtrace ***
    

After that it's always the first worker that gets segfault.
So far I wasn't able to reproduce it.

@unbit
Copy link
Owner

unbit commented Feb 12, 2013

So, the new master is almost ready. Time to make a decision. How to deal with worker signal handlers when in cheap mode (or the corresponding worker is cheaped) ? Just spawn the worker ? drop the signal ? Other solutions ?

@prymitive
Copy link
Contributor Author

I would spawn new worker, if user want to handle such signal that that should take precedence over cheaper mode

@prymitive
Copy link
Contributor Author

What about this one?

@jsivak
Copy link

jsivak commented Aug 31, 2014

Is this issue still being addressed or investigated? My current solution is to:

  1. Not use --idle
  2. always require one worker to be running in order for timer events to be handled correctly

But it would be nicer if the uWSGI master process would spawn a new worker when a timer event fires.

@unbit
Copy link
Owner

unbit commented Aug 31, 2014

We have 3 events to manage:

  • spawn a specific worker when the signal uses the target 'workerN'
  • spawn all of the workers when target 'workers' is used
  • spawn at least a worker when 'worker' target is used.

The only true problem is deciding if this is the default behaviour or not

@jsivak
Copy link

jsivak commented Sep 1, 2014

Not sure if you are asking for my opinion/observation or just stating you're current thought.:

If an option helps, then: The "spawn at least a worker when 'worker' target is used." option seems the most correct/expected. If I set a timer in my code I would expect (when the timer event fires) that if no workers were present then one would be created to service the timer event.

Otherwise, if you're still thinking about this issue at an software architectural level, then I understand.

Thanks

@jsivak
Copy link

jsivak commented Nov 2, 2014

Another idea on this topic:

The @Timer() decorator supports the "target='spooler'" argument, if the @Timer() decorator also supported the "spooler='/path/to/spooler'" argument (which I don't think it does) then I could dedicate a spooler process to handling timer only requests.

Not sure if this is easier than having timer events spawning new workers.

The other reason this method would be useful to me is that I could force all of the logging from the dedicated "timer" spooler to a separate file.

@unbit
Copy link
Owner

unbit commented Nov 3, 2014

This is something to do absolutely (that part was never aligned with the multiple spooler codebase introduced in 1.4). In the mean time you could use a mule (the target syntax take the mule0, mule1 syntax to send requests to a specific mule)

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

To me, it seems logical (= I have expected) that signals are delivered only when worker is active, thus - no process - no delivery attempt.
But since there are cases when signal could be (or should be) used to wakeup a worker, this intention has to be explicitly specified when (or before) registering a handler, and/or (optionally) - when sending a signal.
The default could be made configurable, probably based on worker type (mule/spooler/etc).

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

There are really dozens of scenarios here, this is why we are still searching for a solution.

Some note:

  • on-demand respawning could be bad (think about rails apps)
  • blocking signals for a cheaped target could be bad (you cannot know when cheap-status happens, and you may have time-related signals like timers and crons)

The current solution on top of the list is having a fallback target (something like --signal-fallback mule)
that will route undeliverable signals to a new target (like a spooler or a mule).

This is easy to implement (and probably this is the reason why it is on top of the list ;)

Other solutions require heavy-lifting of the core (like adding a policy field to the signal table that describe how an unroutable signal should behave). Something to think for 2.1

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

What about simple option which (when set) will limit sending signals only to active workers - does it also requires heavy-lifting?
I doubt it will make things worse (unless used improperly) but could cover many cases when current behavior is undesirable, and it is easier to implement (I hope) than fallback target.

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

If you use "worker" as target the signal will be routed to the first available (alive) worker so it should be more than enough for you. The problem (if i understand correctly) is when you have all workers dead.

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

No, the problem arises when some workers are dead (like described in #775 - one is active, 4 are not), and when I need some periodical wake-ups in all active workers.

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

So is the "workers" (with the final 's') target, right ? You need to trigger the signal to all of the workers but it fails as some of them are dead ?

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

Exactly the problem - target is "workers", alive are signalled but dead are not and signals are accumulated in the queue, eventually overflowing it.
Something like "active-workers" as a target would do the trick (probably).

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

Yes, it could be an interesting approach (i mean the "active-workers" target). I will back to you soon

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

Additional note - not sure if it belongs here, but definitely related - allowing specifying target in signal sources (signal/timer/cron/etc) would add a lot of flexibility.
I am not familiar with the core, but it does not look too complex to implement, though I may be wrong :)

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

I have pushed (in both 2.0 and 2.1) the 'active-workers' target. Let me know

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

Seems to work with signal delivery, but produces segfault when attempting to kill('HUP', $$) from signal handler (this did work properly in 2.0.8).

To avoid clutter here all details are in gist: https://gist.github.com/aldem/a19eef9cd58690534a5c

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

This is exactly the kind of problems i was referring in the mailing list thread. The segfault is caused by the new atexit hook in the perl plugin :) I'll try to understand if we can have a workaround

@unbit
Copy link
Owner

unbit commented Nov 16, 2014

Ok, now it should work reliably. The atexit procedure for perl checks if there is already a running function (basically it checks for a running request or a running signal handler), and if so it skips normal interpreter destruction.

@aldem
Copy link
Contributor

aldem commented Nov 16, 2014

Yes, it works - thanks! Though I don't see how atexit() was involved here - it wasn't used at all in my test case...

@unbit
Copy link
Owner

unbit commented Nov 17, 2014

The new plugin version correctly (tries) to call perl_destruct and perl_free. Those functions if called in a signal handler while another subroutine is running, tend to explodes.

@aldem
Copy link
Contributor

aldem commented Nov 17, 2014

What about marking worker as "needs cleanup" if it was busy (in signal/request handler) and cleaning up again (perl_dectruct/perl_free) once all processing is completed?

@jsivak
Copy link

jsivak commented Nov 19, 2014

I'm confused on the progress here; I'm looking for workers to be automatically spawned when a timer event/decorator fires and no workers are active (using the "idle") parameter in the config.

Based discussion in this thread for the past few days, it sounds like automatically spawning workers to handle timer events "is bad".

@unbit
Copy link
Owner

unbit commented Nov 19, 2014

@jsivak it depends on the context. if your app is fully preforked, spawning it on-demand will be a matter of few milliseconds. we are trying to cover different use cases

@jsivak
Copy link

jsivak commented Nov 19, 2014

That type of delay is fine for my needs; I'm running my web app's with "lazy-apps=true" set; not exactly sure if that's what you mean when you say "fully preforked".

If the "auto spawning" is in the uwsgi-2.0 branch on github, then let me know how I can test it (what config file settings to set) and I'll give it a try.

@jsivak
Copy link

jsivak commented Dec 29, 2016

Just got bit with this issue on one of my older apps (running uwsgi 2.0.8).. Did any resolution for this issue get decided on or implemented in a newer version?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants