-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a "restartpause" option + fixing travis-checks #659
base: main
Are you sure you want to change the base?
Conversation
* Applied on startup failure aditionnaly to the backoff delay (by fclairamb) * Applied on bad exit status (by fclairamb) * fixed tests (by pfuender) Default value is 0, which doesn't change the existing behavior. The goal is to impose a minimum delay between restarts to avoid overloading host with restarts.
+1 |
+1 |
Please merge this. I need this. Thanks +1 |
👍 |
+1 This is exactly what I was looking for! In fact, I just ran into a case where all of the restart attempts happened way too quickly during a database failover and left all the processes down in a fatal state. |
+1 |
1.5 months have passed since this pr was opened. merge plz! |
+1 |
Guys - Any paypal account to get you some bribe money to get this merged? ;-) |
Is this being held back for philosophical reasons? |
+1 |
can the merge gods bless this pr already? plz? |
This comment has been minimized.
This comment has been minimized.
It's missing a bit for the doc, maybe that's the blocker , see docs/configuration.rst and supervisor/skel/sample.conf to document that new option. |
* upstream/master: (61 commits) Fix a typo Merge pull request Supervisor#703 from gudata/patch-1 Add pypy3 to tox.ini and .travis.yml Use `make html` to build the docs under tox Fix package name for Sphinx Fix import for Mock Add docs to travis build Test that doc building + readme are correct Add Supervisor 3.2.0 release date to Supervisor 4 changelog Fix system.multicall() broken by faster start/stop patch In past versions, startProcess() and stopProcess() would always return a callback. 50d1857 changed this so they may return either a callback or a bool, but the code that handles system.multicall() was not updated. If multicall was used with stopProcess() followed by startProcess(), it would try to start the process before it had finished stopping. This broke the restart process link on the web interface. Added supervisor_checks to the docs. Fix typo in docs Clarify behavior of user= option. Closes Supervisor#695 Fix start/stop buttons on web broken by faster start/stop patch In past versions, startProcess() and stopProcess() would always return a callback. 50d1857 changed this so they may return either a callback or a bool, but the web interface was not updated. Show error messages when clearing a log on the web interface Move code for start and stop actions near each other Show errors when stopping a process on the web interface Show string description for unexpected faults Show all error messages in TailView. Fixes Supervisor#627 Implement __str__ so code and text of RPCError are logged This is helpful for troubleshooting issues like Supervisor#627 where the traceback doesn't show the contents of the RPCError. ...
There we go! @jrottenberg |
``restartpause`` | ||
|
||
Adds a pause (in seconds) between successive failed start attempts - thus | ||
throttles failed-start attemps and prevents massive load increase during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp: attempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barqshasbite - fixed, thanks!
+1 This restartpause parameter would be very useful |
👍 |
1 similar comment
+1 |
+1 I would love to see this. Does not look like a breaking change, so it should not require a major version bump, yes? |
+1 Please add an option for delayed restarts. |
C'mon people, use thumbs up please. |
👍 |
please merge! |
Has this been merged? |
Another gentle +1 from me. I'm running my own fork just for this option. |
@mnaberez, do you know can you help with merging this? The PR is opened since more then a year... |
merge please? :'( |
+1 Thanks! |
This behavior technically exists already with the 'startretries' option. If you set startretries=10, for example, every time it fails to start, it goes into the BACKOFF state where it waits n seconds, where n is the number of start failures it has experienced so far. So on the first failure it waits 1 second before trying to start again. On the second failure it waits 2 seconds before trying to start again. On the third failure it waits 3 seconds. And so on all the way up to 10 failures and a 10 second wait (at which point it would have waited 1+2+3+4+5+6+7+8+9+10=55 seconds total). Searching the repo for 'startretries' will highlight how it interacts with the BACKOFF state and the 'backoff' delay. See also the process states: http://supervisord.org/subprocess.html#process-states This PR just adds an additional delay on top of the pre-existing 'backoff' delay, which I suspect over complicates things and is why it is not being merged through. If you need a large delay, I recommend just setting 'startretries' to a sufficiently large number. |
I read the code to verify @barqshasbite's explanation. I think perhaps adding this explanation to http://supervisord.org/subprocess.html#process-states and a note to http://supervisord.org/configuration.html#program-x-section-settings about the BACKOFF state and its' increasing duration would be sufficient. The program config section gives no suggestion that this backoff strategy is used. |
Linear backoff isn't always enough. When dealing with longer term issues I would prefer exponential backoff, as to no get spammed with failure messages. |
Totally agree with @Don42 ! |
Any reason this isn't merged? the PR looks quite straightforward adding |
Friendly bump to admin team, this would be great if it was merged :D |
I am currently working on other issues but I am not the only person with commit access to this repository so I will leave it open. |
+1 |
1 similar comment
👍 |
I will set The kinds of failure I am restarting from will not be resolved in seconds, so I am guaranteed to have several dozen unnecessary failures until it is waiting an appropriate amount of time to restart. I'm not a fan of approach of using a wrapper that |
Please merge this :) |
What's the problem, exactly? You don't like the idea, then shut it down. Like it, what are you waiting so long for? How long can you wait for such a decision? |
+1, please merge. |
10 years soon. |
+1 |
1 similar comment
+1 |
This is actually the same as PR #509 by fclairamb, just added my 2 cents, and (hopefully) fixed the tests. Please let us know if there's anything more needed to get this merged.
Thanks
Default value is 0, which doesn't change the existing behavior.
The goal is to impose a minimum delay between restarts to avoid overloading host with restarts.