Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting the application disables monitors with interval greater than application uptime #3504

Closed
2 tasks done
alexklibisz opened this issue Jul 30, 2023 · 15 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@alexklibisz
Copy link

alexklibisz commented Jul 30, 2023

❗ ❗ For those just skimming, the solution was: Push monitors get reset when the uptime-kuma application reboots. So if you restart your application at some interval (e.g., for a backup), then it will disable any push monitors which have a greater interval. In my case, I had a 25 hour push monitor and I was restarting the application once every 24 hours for a backup. I just stopped restarting the application for the backup, and the push monitors work fine again.

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

I have a push monitor set to a 90000 second (25 hours) interval. I have a script that runs once/day and curls the push monitor URL upon successful completion. I disabled the script and noticed that I did not get any alert.

Here's the configuration:

image

I have the same setup for some monitors/scripts on 60 second intervals, and they all correctly trigger when the script does not run.

👟 Reproduction steps

  1. Setup a push monitor w/ a 90000 second interval
  2. Curl the monitor URL once
  3. Never curl it again
  4. The push monitor never goes red

👀 Expected behavior

If the push URL does not receive a request in 25 hours, it should trip the alert. In other words, a push monitor should behave the same regardless its interval.

😓 Actual Behavior

See description

🐻 Uptime-Kuma Version

1.22.1-debian

💻 Operating System and Arch

Ubuntu 22.04

🌐 Browser

Chronium 114.0.5735.198

🐋 Docker Version

23.0.5, build bc4487a

🟩 NodeJS Version

No response

📝 Relevant log output

No response

@alexklibisz alexklibisz added the bug Something isn't working label Jul 30, 2023
@louislam louislam added help and removed bug Something isn't working labels Jul 30, 2023
@louislam
Copy link
Owner

Set Retries to 0.

@alexklibisz
Copy link
Author

Thanks, I'll try that. Could you explain why this is necessary?

@louislam
Copy link
Owner

With Retries=1, the monitor will not send a notification on the first failed check. It will check one more time to confirm that. In your case, it will send in 50 hours.

If you want to keep the retries logic, you can lower Heartbeat Retry Interval to maybe 60 seconds, so it won't take too long to retry.

@alexklibisz
Copy link
Author

alexklibisz commented Jul 30, 2023

Hmm. The problem was that it just never sends the notification. Even if the script is down for weeks.

@chakflying
Copy link
Collaborator

I swear I have seen this before. After a bit (a lot) of digging I finally found this: #2801

I will setup a push monitor to try to test this myself.

@louislam louislam added bug Something isn't working and removed help labels Jul 30, 2023
@louislam
Copy link
Owner

louislam commented Jul 30, 2023

I swear I have seen this before. After a bit (a lot) of digging I finally found this: #2801

I will setup a push monitor to try to test this myself.

Oh, it's the push monitor again, I feed this monitor type's implementation becomes over complicated somehow...

Change back to bug, did not realize the monitor type.

@alexklibisz
Copy link
Author

Thanks @louislam . Just a bit of feedback: as a user, I have found the Retries and Heartbeat Retry Interval parameters confusing. I'm not sure what it means to "retry" a push monitor. The monitor is just waiting for an HTTP request. I can't think of anything that it could be retrying.

@alexklibisz
Copy link
Author

I think I might have found one contributing factor to the issue. I normally run a backup of my uptime-kuma container that involves stopping and restarting the container. I disabled this backup, created a new 25-hour alert, curled it once, and then left it alone. I got an email alert this time. The only difference compared to my previous setup is that I did not re-start the container. @louislam is there perhaps anything in the push metric code that would be affected by a container restart?

@chakflying
Copy link
Collaborator

Wait then this may be more dumb than I thought... On server start, we schedule a task after your defined interval to check if the push route has been called.

If you set your interval to 25 hours, then restart the server every 24 hours, then obviously the task will never get to fire 🤦🏻‍♂️

@alexklibisz
Copy link
Author

Got it, that makes sense. I don't know if it's particularly obvious, though. Many applications can gracefully handle restarts without affecting behavior. I had assumed this was the case for uptime-kuma.

@chakflying
Copy link
Collaborator

I agree, we should handle this better. I meant that as I expected this to be a week long debugging session involving tracking down platform-dependent or race condition issues. Turns out it's way more "obvious" than that.

I think on restart we can compute the remaining interval like I did in #3072, then schedule the next check base on that. But it would be a slight change from the current behavior as people who have their interval set at 60s would likely see their push monitor immediately go Down after restart.

Also currently retries do not persisting across restart, and that's more difficult to fix.

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Aug 1, 2023

likely see their push monitor immediately go Down after restart.

Do you think we need to introduce a minimum time in this case?
(Could the improvement you are thinking about solve #454 as well?)

@alexklibisz
Copy link
Author

This has been running well since I stopped doing the nightly reboot. Feel free to close this issue. I'll be excited to see a fix at some point, but no rush.

@chakflying
Copy link
Collaborator

You can also change the issue title to be more descriptive of what the actual problem is, maybe we can keep this open until a fix is available.

@alexklibisz alexklibisz changed the title Push monitor w/ interval > 1 day does not seem to work Restarting the application disables monitors with interval greater than application uptime Aug 9, 2023
@alexklibisz
Copy link
Author

You can also change the issue title to be more descriptive of what the actual problem is, maybe we can keep this open until a fix is available.

I changed the title and will close.

@louislam feel free to re-open if this is a good place to track the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants