-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitors all slow down (longer response time) over time, reboot/restart fixes it #2344
Comments
i also experienced similiar problem, even getting websocket error in my uptime kuma. |
Its odd, I have 2 servers running the kuma, both are ubuntu 20.04, have 4+gb ram, 2cpu and plenty of space. @kevinstevanus are you doing intranet? what monitors do you use? |
currently, we have an intranet server to run the Kuma, using windows server to monitor SQL server and APIs, and my guess at the time is because kuma server is slowing down over time |
Since we are both monitoring sql servers, I wonder if it is coming from that. I'll try adding a sql monitor to my public server and see what happens over a few days. |
I just look into it, I think it may be the root cause. I think the connection do not closed correctly. Issues:
uptime-kuma/server/util-server.js Lines 251 to 265 in 2c3abdc
|
thanks! I thought we had gone through this before when we added the feature, I wonder if something was updated in mssql? Anyways from the docs it looks like closing the pool is the right way to go. |
Ignore my previous comment. I tested it, it does close the pool without problems.
But I actually found an unrelated issue that since it is using global pool, it may have a problem if there are more than one SQL Server monitor. Monitor could used a global pool which was created by other monitors. Maybe I will open a new issue for this. At this point, I am still not sure whether it is related to SQL Server monitor. Could you pause the SQL Server monitor and using TCP Port first? So we could have an answer next week. |
Yeah, I added a sql monitor to my public server to see if it starts slowing down. I'll check back on it in a few days. I can also turn off the sql monitors on my intranet for a few days to see whats up. |
So, now I have 37 http monitors running, 2 are failed monitors (always offline). I disabled the sql and tcp monitors for the week. If there is not change in a few days I will turn tcp back on and see how it looks. Thanks! |
@louislam fyi, a couple days in here and still getting the slowdown: I only have http monitors enabled. The only other difference between this and my public server (I think...) is that the internal server has an internal dns.... wonder if the cache is piling up or something? I will see if I can get netdata on to see what is actually growing on the server. |
I installed netdata and rebooted.. I'll report back with anything I find in a few days. |
Since the performance is getting slow over time, I still think it should be likely due to memory leak or too many heavy tasks/functions are running inside Uptime Kuma. But just all my guessing, I still don't have any clues yet. I just read, adding |
cool! |
I'm checking through the net data charts - here are the ones that stand out: (kuma is the only thing running on this server, using pm2) Total ProcessesTime spent servicing hardware interrupts
Core utilizationCommitted memoryCommitted Memory, is the sum of all memory which has been allocated by processes. HtopHere's the interesting part from htop: pm2
Is there anything else I can check 2/ pm2 that would be more helpful? |
Should be memory leak. You should find the log in |
The error log has this -
The other log file looks like normal output that I get when running the dev site locally. |
It's odd the processes are growing as well, but don't show in htop. Do you think an errored process is staying on in the background? |
Here's the last 100 from the non-error logs:
|
Thanks for the info! Hope I can figure out the problem.
I guess it causes by this bug, I recently fixed: e478084 |
Should be fixed by 466b403 now. You can test it using master branch. |
Thanks!! I'll swap to master and see how it goes. |
@louislam also I'm finding that 3e68cf2 broke a few of my site (just found when installing the 19 beta). I get that "incorrect header check". My header that is failing is I'm thinking it is because of the "decompress". axios/axios#2406 |
1da00d1 should fix your new issue. But unfortunately, avoiding getPeerCertificate error didn't solve the issue. At this point, I have no idea where is the memory leak. |
sweet, thanks! I will try to dig deeper into pm2 to see if there is a way to see what is going on better. |
@louislam This didn't fix it completely, but if I add in will this work w/ #2253? Since the data is not used anywhere (? right?) I didn't add in the 2nd part of this answer (the transform) Also, is there any reason not to have |
fyi, I will update node (14 > 18) and pm2 (5.2.0 > 5.2.2) and see if it helps /w the memory. |
Enable inspector
You can wait a few days and connect to it. |
thanks I will try it! |
for reference, I got it connected to my remote server by adding Starting with |
This is from my fix: 466b403 It should be caught and safe now. Just an error message. But I don't know why socket can be undefined here. I have never seen this error in my end. |
I took a heap snapshot and will take another later on to compare. Interestingly, I'm just sitting here watching the console lol, and this one popped up a few times.. about 50+ times in 1 second, then stopped. #2346 There were a few different messages:
|
I just updated to the latest version.. will check back in a few days. I was testing w/ the flag |
the --inspect arg keeps disconnecting me so I can't get in to do a new profile :/ On the side, I switched back to node 16 + the latest version 19.3 and still have the memory increase. The cpu around 100% on 1 cpu seems normal. I started the app up directly with node server.js and it starts up around 80% of 1 cpu.. so the issue must be in the memory. After 3-4 days the mem goes from 30mb to 500mb. Do you know any other tricks to get a snapshot w/out remote inspecting? |
I still have no idea so far. If you don't mind, you can send your You should remove |
@christopherpickering are you still experiencing memory leaks? |
Yes, I just started to make it restart nightly. The other option is to use the pm2 auto restart when a memory max is reached. Either should solve this one. Still it is only my intranet monitor, my public ones have no problem. Internal one is also monitoring tcp and sql. |
I always use the latest release as well 👍🏼 super great tool, I recommend it to many. |
Another idea I had on this - I'm running lots of jobs on a small server, and some jobs (mssql and ntlm) take more time to complete. I haven't read the code here yet, but is it potentially not a memory leak but a backlog of jobs to do the status check? For example, if the status check doesn't finish all monitors in 60 seconds every time then I will slowly build a backlog of status checks waiting to happen? |
We should have timeouts in place for this edgecase. |
@CommanderStorm thanks, I will give both comments a shot, I didn't update to 22 yet but will Monday. |
@MetSystem I don't understand what you are trying to convey. |
@christopherpickering |
No luck yet, I forgot to disable my cron restart job lol, sorry for the delay. |
@christopherpickering |
Hey, you beat me here today, 😁 I just checked in on the monitors. Since a restart 10 days ago, they have not slowed down. Also, the used memory is approx the same as it was just after a reboot. Prior to the update the memory was constantly growing (I could watch it grow slowly in pm2 monit) but now it is steady. Thanks! |
🛡️ Security Policy
📝 Describe your problem
Hi!
I'm curious if anyone has a similar problem to me, with a resolution.
Over time all my monitors (30+) have a bigger and bigger response time. When I reboot the server the response time goes back down to "normal", and then begins to slowly increase again.
Another thing to note is that after 1-2 weeks w/out a reboot I can no longer see monitors in the UI - they do not load and it asks me to create a monitor. I still get notifications, etc, of stuff going down/on, and after rebooting the server the history is all there.
My server has 4cpu, 6gb ram and plenty of space (>10gb free).
Here's an example monitor. Some are much more drastic growth.
🐻 Uptime-Kuma Version
1.18.5
💻 Operating System and Arch
Ubuntu 20.04
🌐 Browser
Safari
🐋 Docker Version
No response
🟩 NodeJS Version
14.21.1
The text was updated successfully, but these errors were encountered: