-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cron): Log long running jobs #45804
Conversation
Signed-off-by: Christoph Wurst <christoph@winzerhof-wurst.at>
/backport to stable29 |
/backport to stable28 |
/backport to stable27 |
/backport to stable27 |
/backport to stable28 |
@AndyScherzinger can we have a back port for 25 ? |
/backport to stable26 |
/backport to stable25 |
@marinofaggiana I don't know, let's try and see if the bot can create them 🤞 |
So PR could be created but is incomplete according to the bot #46706 @marinofaggiana - best you align with @ChristophWurst to have them wrapped up for 25 and 26 |
ok |
Let's try porting from 27, where I have already had to resolve conflicts: #45855 (comment). |
Worked. @AndyScherzinger @marinofaggiana if you need more backports use stable27 as base |
Summary
If cron jobs take a very long time to complete they will start to run in parallel. That's because jobs are only reserved for 12h. Afterwards we just assume that the jobs failed and start the job again. In faulty situations that can lead to more and more server load.
Here is an example:
It looks like a job executes an expensive query over and over. After 12h the query time doubles, after 24h it triplicates, after 36h it quadruples, etc. It's not clear if that is what is really happening.
I've tried to be reasonable with the log level so we don't spam the logs too much:
TODO
Checklist