Added a "restartpause" option + fixing travis-checks #659

pfuender · 2015-09-16T08:35:28Z

This is actually the same as PR #509 by fclairamb, just added my 2 cents, and (hopefully) fixed the tests. Please let us know if there's anything more needed to get this merged.

Thanks

Applied on startup failure aditionnaly to the backoff delay (by fclairamb)
Applied on bad exit status (by fclairamb)
fixed tests (by pfuender)

Default value is 0, which doesn't change the existing behavior.

The goal is to impose a minimum delay between restarts to avoid overloading host with restarts.

* Applied on startup failure aditionnaly to the backoff delay (by fclairamb) * Applied on bad exit status (by fclairamb) * fixed tests (by pfuender) Default value is 0, which doesn't change the existing behavior. The goal is to impose a minimum delay between restarts to avoid overloading host with restarts.

mrook · 2015-09-16T08:38:37Z

+1

toastbrotch · 2015-09-16T08:39:44Z

+1

sschueller · 2015-09-16T08:40:33Z

Please merge this. I need this. Thanks +1

miso-belica · 2015-09-16T08:41:59Z

👍

icholy · 2015-10-06T19:20:40Z

RFC @mcdonc @mnaberez @theduderog

oryband · 2015-10-18T08:42:27Z

@ofir-petrushka @doodyparizada @fuzzyami

stevenscg · 2015-10-19T12:20:05Z

+1 This is exactly what I was looking for!

In fact, I just ran into a case where all of the restart attempts happened way too quickly during a database failover and left all the processes down in a fatal state.

doodyparizada · 2015-10-25T09:29:25Z

+1

oryband · 2015-11-01T07:56:21Z

1.5 months have passed since this pr was opened. merge plz!

oliparcol · 2015-11-01T07:57:56Z

+1

pfuender · 2015-11-24T07:59:17Z

Guys - Any paypal account to get you some bribe money to get this merged? ;-)

icholy · 2015-11-24T14:51:44Z

Is this being held back for philosophical reasons?

webtussi · 2015-12-07T13:25:42Z

+1

oryband · 2015-12-20T22:41:46Z

can the merge gods bless this pr already? plz?

jrottenberg · 2015-12-22T00:00:42Z

It's missing a bit for the doc, maybe that's the blocker , see docs/configuration.rst and supervisor/skel/sample.conf to document that new option.
If you look at the recent commits they are mostly touching files from the docs

* upstream/master: (61 commits) Fix a typo Merge pull request Supervisor#703 from gudata/patch-1 Add pypy3 to tox.ini and .travis.yml Use `make html` to build the docs under tox Fix package name for Sphinx Fix import for Mock Add docs to travis build Test that doc building + readme are correct Add Supervisor 3.2.0 release date to Supervisor 4 changelog Fix system.multicall() broken by faster start/stop patch In past versions, startProcess() and stopProcess() would always return a callback. 50d1857 changed this so they may return either a callback or a bool, but the code that handles system.multicall() was not updated. If multicall was used with stopProcess() followed by startProcess(), it would try to start the process before it had finished stopping. This broke the restart process link on the web interface. Added supervisor_checks to the docs. Fix typo in docs Clarify behavior of user= option. Closes Supervisor#695 Fix start/stop buttons on web broken by faster start/stop patch In past versions, startProcess() and stopProcess() would always return a callback. 50d1857 changed this so they may return either a callback or a bool, but the web interface was not updated. Show error messages when clearing a log on the web interface Move code for start and stop actions near each other Show errors when stopping a process on the web interface Show string description for unexpected faults Show all error messages in TailView. Fixes Supervisor#627 Implement __str__ so code and text of RPCError are logged This is helpful for troubleshooting issues like Supervisor#627 where the traceback doesn't show the contents of the RPCError. ...

pfuender · 2015-12-30T15:39:01Z

There we go! @jrottenberg

barqshasbite · 2015-12-30T16:00:03Z

docs/configuration.rst

+``restartpause``
+
+  Adds a pause (in seconds) between successive failed start attempts - thus
+  throttles failed-start attemps and prevents massive load increase during


sp: attempts

@barqshasbite - fixed, thanks!

harijoe · 2016-01-06T09:24:17Z

+1 This restartpause parameter would be very useful

dbpolito · 2016-02-12T12:00:14Z

👍

igama · 2016-02-17T11:27:48Z

+1

ffrank · 2016-02-24T11:11:37Z

+1 I would love to see this.

Does not look like a breaking change, so it should not require a major version bump, yes?

ewerk-tstein · 2016-04-26T12:28:08Z

+1 Please add an option for delayed restarts.

icholy · 2016-04-26T13:26:29Z

C'mon people, use thumbs up please.

mikezawitkowski · 2016-04-26T18:26:16Z

👍

toastbrotch · 2016-06-29T11:50:10Z

please merge!
what is the hold up?

jonathan-kosgei · 2016-08-19T08:06:13Z

Has this been merged?

nzjrs · 2016-08-28T08:05:21Z

Another gentle +1 from me. I'm running my own fork just for this option.

s12v · 2017-01-18T12:32:32Z

@mnaberez, do you know can you help with merging this? The PR is opened since more then a year...

briandignan · 2017-01-26T02:13:41Z

merge please? :'(

wjdecorte · 2017-06-05T19:47:20Z

+1
Will soon be on the two year mark. Is there any chance of getting this merged? I'd really, really, really like to have the restartpause option without running my own fork.

Thanks!

barqshasbite · 2017-06-05T20:11:25Z

This behavior technically exists already with the 'startretries' option. If you set startretries=10, for example, every time it fails to start, it goes into the BACKOFF state where it waits n seconds, where n is the number of start failures it has experienced so far. So on the first failure it waits 1 second before trying to start again. On the second failure it waits 2 seconds before trying to start again. On the third failure it waits 3 seconds. And so on all the way up to 10 failures and a 10 second wait (at which point it would have waited 1+2+3+4+5+6+7+8+9+10=55 seconds total).

Searching the repo for 'startretries' will highlight how it interacts with the BACKOFF state and the 'backoff' delay.

See also the process states: http://supervisord.org/subprocess.html#process-states

This PR just adds an additional delay on top of the pre-existing 'backoff' delay, which I suspect over complicates things and is why it is not being merged through. If you need a large delay, I recommend just setting 'startretries' to a sufficiently large number.

jimbrowne · 2017-06-16T19:27:57Z

I read the code to verify @barqshasbite's explanation. I think perhaps adding this explanation to http://supervisord.org/subprocess.html#process-states and a note to http://supervisord.org/configuration.html#program-x-section-settings about the BACKOFF state and its' increasing duration would be sufficient. The program config section gives no suggestion that this backoff strategy is used.

Don42 · 2017-06-17T21:57:11Z

Linear backoff isn't always enough. When dealing with longer term issues I would prefer exponential backoff, as to no get spammed with failure messages.

mikemix · 2017-09-21T20:25:20Z

Totally agree with @Don42 !

elgalu · 2018-10-25T09:37:08Z

Any reason this isn't merged? the PR looks quite straightforward adding restartpause would be a great addition even if not perfect.

lhovo · 2019-06-14T13:43:14Z

Friendly bump to admin team, this would be great if it was merged :D

fclairamb · 2020-02-12T00:01:51Z

4 years about my half-baked proposal (#509), this still hasn't been merged 😄! @mnaberez if you don't like this PR, you should at least close it.

mnaberez · 2020-02-12T04:09:58Z

@mnaberez if you don't like this PR, you should at least close it.

I am currently working on other issues but I am not the only person with commit access to this repository so I will leave it open.

SeanJA · 2020-02-19T21:57:09Z

+1

elgalu · 2020-02-27T14:24:55Z

👍

isosphere · 2022-05-01T16:58:27Z

I will set startretries to 999 for my use case in the absence of this PR being merged but I would rather have this feature.

The kinds of failure I am restarting from will not be resolved in seconds, so I am guaranteed to have several dozen unnecessary failures until it is waiting an appropriate amount of time to restart.

I'm not a fan of approach of using a wrapper that sleeps after execution because then you're always delaying restarts, not just when you have some kind of fatal error.

michnhokn · 2023-02-01T13:36:17Z

Please merge this :)

gander · 2023-05-13T10:14:03Z

@mnaberez @theduderog @mcdonc

What's the problem, exactly? You don't like the idea, then shut it down. Like it, what are you waiting so long for? How long can you wait for such a decision?

mikemix · 2023-05-13T14:21:16Z

+1, please merge.

wokenlex · 2024-03-19T13:54:30Z

10 years soon.

claranet-retailweb · 2024-05-24T14:24:03Z

+1

asstron0me · 2024-09-10T08:42:01Z

+1

rifat-simoom · 2025-03-06T11:23:11Z

1.5 months have passed since this pr was opened. merge plz!
Haha

pfuender mentioned this pull request Sep 16, 2015

Added a "restartpause" option #509

Closed

pfuender mentioned this pull request Sep 16, 2015

feature request: delay restart before status FATAL #487

Open

This comment has been minimized.

Sign in to view

Raphael Hoegger added 2 commits December 30, 2015 16:21

document new "restartpause" option

76d2a4f

barqshasbite reviewed Dec 30, 2015
View reviewed changes

fixing typo - s/attemps/attempts/

6c04d47

vitalyzhakov mentioned this pull request Jan 30, 2019

When database is down, worker will down yiisoft/yii2-queue#303

Closed

Added a "restartpause" option + fixing travis-checks #659

Are you sure you want to change the base?

Added a "restartpause" option + fixing travis-checks #659

Conversation

pfuender commented Sep 16, 2015

Thanks

Uh oh!

mrook commented Sep 16, 2015

Uh oh!

toastbrotch commented Sep 16, 2015

Uh oh!

sschueller commented Sep 16, 2015

Uh oh!

miso-belica commented Sep 16, 2015

Uh oh!

icholy commented Oct 6, 2015

Uh oh!

oryband commented Oct 18, 2015

Uh oh!

stevenscg commented Oct 19, 2015

Uh oh!

doodyparizada commented Oct 25, 2015

Uh oh!

oryband commented Nov 1, 2015

Uh oh!

oliparcol commented Nov 1, 2015

Uh oh!

pfuender commented Nov 24, 2015

Uh oh!

icholy commented Nov 24, 2015

Uh oh!

webtussi commented Dec 7, 2015

Uh oh!

oryband commented Dec 20, 2015

Uh oh!

This comment has been minimized.

jrottenberg commented Dec 22, 2015

Uh oh!

pfuender commented Dec 30, 2015

Uh oh!

barqshasbite Dec 30, 2015

Choose a reason for hiding this comment

Uh oh!

pfuender Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

harijoe commented Jan 6, 2016

Uh oh!

dbpolito commented Feb 12, 2016

Uh oh!

igama commented Feb 17, 2016

Uh oh!

ffrank commented Feb 24, 2016

Uh oh!

ewerk-tstein commented Apr 26, 2016

Uh oh!

icholy commented Apr 26, 2016

Uh oh!

mikezawitkowski commented Apr 26, 2016

Uh oh!

toastbrotch commented Jun 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathan-kosgei commented Aug 19, 2016

Uh oh!

nzjrs commented Aug 28, 2016

Uh oh!

s12v commented Jan 18, 2017

Uh oh!

briandignan commented Jan 26, 2017

Uh oh!

wjdecorte commented Jun 5, 2017

Uh oh!

barqshasbite commented Jun 5, 2017

Uh oh!

jimbrowne commented Jun 16, 2017

Uh oh!

Don42 commented Jun 17, 2017

Uh oh!

mikemix commented Sep 21, 2017

toastbrotch commented Jun 29, 2016 •

edited

Loading

isosphere commented May 1, 2022 •

edited

Loading