Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitors all slow down (longer response time) over time, reboot/restart fixes it #2344

Closed
2 tasks done
christopherpickering opened this issue Nov 23, 2022 · 50 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@christopherpickering
Copy link
Contributor

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

📝 Describe your problem

Hi!
I'm curious if anyone has a similar problem to me, with a resolution.

Over time all my monitors (30+) have a bigger and bigger response time. When I reboot the server the response time goes back down to "normal", and then begins to slowly increase again.

Another thing to note is that after 1-2 weeks w/out a reboot I can no longer see monitors in the UI - they do not load and it asks me to create a monitor. I still get notifications, etc, of stuff going down/on, and after rebooting the server the history is all there.

My server has 4cpu, 6gb ram and plenty of space (>10gb free).

Here's an example monitor. Some are much more drastic growth.

image

🐻 Uptime-Kuma Version

1.18.5

💻 Operating System and Arch

Ubuntu 20.04

🌐 Browser

Safari

🐋 Docker Version

No response

🟩 NodeJS Version

14.21.1

@louislam
Copy link
Owner

louislam commented Nov 24, 2022

It is weird, it is fine in my side.
image

@kevinstevanus
Copy link

kevinstevanus commented Dec 2, 2022

i also experienced similiar problem, even getting websocket error in my uptime kuma.
solved my problem by setting up cron job to restart the uptime kuma every 24 hour

@christopherpickering
Copy link
Contributor Author

Its odd, I have 2 servers running the kuma, both are ubuntu 20.04, have 4+gb ram, 2cpu and plenty of space.
One server is running on the intratnet and monitoring intranet websites (http, sql server, tls monitors) - the is the server w/ the problem.
The second server is on a public server monitoring public websites (http monitors) - this server has no problem.

@kevinstevanus are you doing intranet? what monitors do you use?

@kevinstevanus
Copy link

currently, we have an intranet server to run the Kuma, using windows server to monitor SQL server and APIs, and my guess at the time is because kuma server is slowing down over time

@christopherpickering
Copy link
Contributor Author

Since we are both monitoring sql servers, I wonder if it is coming from that. I'll try adding a sql monitor to my public server and see what happens over a few days.

@louislam
Copy link
Owner

louislam commented Dec 5, 2022

I just look into it, I think it may be the root cause. I think the connection do not closed correctly.

Issues:

  1. I guess it do not close the pool correctly. It should call pool.close()
  2. It should not use pool anyway if it will be closed quickly.

exports.mssqlQuery = function (connectionString, query) {
return new Promise((resolve, reject) => {
mssql.connect(connectionString).then(pool => {
return pool.request()
.query(query);
}).then(result => {
resolve(result);
}).catch(err => {
reject(err);
}).finally(() => {
mssql.close();
});
});
};

@louislam louislam added bug Something isn't working and removed help labels Dec 5, 2022
@christopherpickering
Copy link
Contributor Author

thanks! I thought we had gone through this before when we added the feature, I wonder if something was updated in mssql? Anyways from the docs it looks like closing the pool is the right way to go.

@louislam
Copy link
Owner

louislam commented Dec 6, 2022

Ignore my previous comment. I tested it, it does close the pool without problems.

  • mssql.connect(connectionString) is creating a new global pool or using a existing global pool
  • the local pool object actually is the same as mssql's global one
  • mssql.close() is used for closing a global pool

But I actually found an unrelated issue that since it is using global pool, it may have a problem if there are more than one SQL Server monitor. Monitor could used a global pool which was created by other monitors. Maybe I will open a new issue for this.

At this point, I am still not sure whether it is related to SQL Server monitor. Could you pause the SQL Server monitor and using TCP Port first? So we could have an answer next week.

@christopherpickering
Copy link
Contributor Author

Yeah, I added a sql monitor to my public server to see if it starts slowing down. I'll check back on it in a few days. I can also turn off the sql monitors on my intranet for a few days to see whats up.

@christopherpickering
Copy link
Contributor Author

So, now I have 37 http monitors running, 2 are failed monitors (always offline). I disabled the sql and tcp monitors for the week. If there is not change in a few days I will turn tcp back on and see how it looks. Thanks!

@christopherpickering
Copy link
Contributor Author

christopherpickering commented Dec 9, 2022

@louislam fyi, a couple days in here and still getting the slowdown:

image

I only have http monitors enabled. The only other difference between this and my public server (I think...) is that the internal server has an internal dns.... wonder if the cache is piling up or something?

I will see if I can get netdata on to see what is actually growing on the server.

@christopherpickering
Copy link
Contributor Author

I installed netdata and rebooted.. I'll report back with anything I find in a few days.

@louislam
Copy link
Owner

louislam commented Dec 9, 2022

internal dns

Since the performance is getting slow over time, I still think it should be likely due to memory leak or too many heavy tasks/functions are running inside Uptime Kuma. But just all my guessing, I still don't have any clues yet.

I just read, adding --inspect to Node.js could enable debugging with Chrome DevTools which you can see what Uptime Kuma is doing inside. Maybe I could build a special image for debugging.

@christopherpickering
Copy link
Contributor Author

cool!

@christopherpickering
Copy link
Contributor Author

I'm checking through the net data charts - here are the ones that stand out:

(kuma is the only thing running on this server, using pm2)

Total Processes

image

Time spent servicing hardware interrupts

  • pink is "local-timer"

image

Core utilization

image

Committed memory

Committed Memory, is the sum of all memory which has been allocated by processes.

image

Htop

Here's the interesting part from htop:

image

pm2

pm2 ls

image

pm2 monit > not sure why this show 100% cpu while the other doesn't. Also not sure why pm2 isn't using all my cpus.

image

Is there anything else I can check 2/ pm2 that would be more helpful?

@christopherpickering
Copy link
Contributor Author

I do have some monitors that always error:
image

wonder if the errored monitors are not closing right?

@louislam
Copy link
Owner

louislam commented Dec 12, 2022

Should be memory leak.

You should find the log in ~/.pm2/logs

@christopherpickering
Copy link
Contributor Author

The error log has this -

2022-12-12T17:33:46.969Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:33:47.649Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:34:00.435Z [MONITOR] ERROR: Caught error
2022-12-12T17:34:00.436Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:34:11.229Z [MONITOR] ERROR: Caught error
2022-12-12T17:34:11.229Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:34:48.696Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:34:52.972Z [MONITOR] ERROR: Caught error
2022-12-12T17:34:52.973Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:35:01.896Z [MONITOR] ERROR: Caught error
2022-12-12T17:35:01.896Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:35:49.959Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:36:03.363Z [MONITOR] ERROR: Caught error
2022-12-12T17:36:03.363Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:36:51.135Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:37:05.191Z [MONITOR] ERROR: Caught error
2022-12-12T17:37:05.191Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:37:25.943Z [MONITOR] WARN: Monitor #7 'Atlas Hub Test': Failing: Request failed with status code 502 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:37:51.924Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:38:07.214Z [MONITOR] ERROR: Caught error
2022-12-12T17:38:07.214Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:38:52.857Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:39:08.742Z [MONITOR] ERROR: Caught error
2022-12-12T17:39:08.742Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:39:53.733Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:40:10.647Z [MONITOR] ERROR: Caught error
2022-12-12T17:40:10.647Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:40:54.398Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:41:12.271Z [MONITOR] ERROR: Caught error
2022-12-12T17:41:12.271Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:41:55.661Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:42:14.057Z [MONITOR] ERROR: Caught error
2022-12-12T17:42:14.058Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:42:56.713Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:43:15.894Z [MONITOR] ERROR: Caught error
2022-12-12T17:43:15.894Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:43:56.946Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:44:16.876Z [MONITOR] ERROR: Caught error
2022-12-12T17:44:16.876Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:44:57.931Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:45:18.349Z [MONITOR] ERROR: Caught error
2022-12-12T17:45:18.349Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:45:58.771Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:46:19.602Z [MONITOR] ERROR: Caught error
2022-12-12T17:46:19.602Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:46:44.488Z [MONITOR] WARN: Monitor #30 'Caboodle Console Rel': Failing: Request failed with status code 500 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:46:59.843Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:47:21.148Z [MONITOR] ERROR: Caught error
2022-12-12T17:47:21.148Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:47:38.671Z [MONITOR] ERROR: Caught error
2022-12-12T17:47:38.671Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:48:00.171Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:48:22.526Z [MONITOR] ERROR: Caught error
2022-12-12T17:48:22.526Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:48:48.229Z [MONITOR] ERROR: Caught error
2022-12-12T17:48:48.229Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:49:00.767Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:49:12.884Z [MONITOR] ERROR: Caught error
2022-12-12T17:49:12.884Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:49:24.492Z [MONITOR] ERROR: Caught error
2022-12-12T17:49:24.492Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:49:55.330Z [MONITOR] ERROR: Caught error
2022-12-12T17:49:55.330Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:50:01.898Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:50:26.118Z [MONITOR] ERROR: Caught error
2022-12-12T17:50:26.118Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:51:02.733Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:51:27.605Z [MONITOR] ERROR: Caught error
2022-12-12T17:51:27.605Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:52:03.681Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:52:26.619Z [MONITOR] WARN: Monitor #7 'Atlas Hub Test': Failing: Request failed with status code 502 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:52:29.121Z [MONITOR] ERROR: Caught error
2022-12-12T17:52:29.121Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:53:04.691Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:53:30.903Z [MONITOR] ERROR: Caught error
2022-12-12T17:53:30.903Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:54:05.276Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:54:32.389Z [MONITOR] ERROR: Caught error
2022-12-12T17:54:32.389Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:55:06.392Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:55:33.636Z [MONITOR] ERROR: Caught error
2022-12-12T17:55:33.636Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:56:07.531Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:56:35.500Z [MONITOR] ERROR: Caught error
2022-12-12T17:56:35.500Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:57:08.319Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:57:37.630Z [MONITOR] ERROR: Caught error
2022-12-12T17:57:37.630Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:58:09.167Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:58:39.197Z [MONITOR] ERROR: Caught error
2022-12-12T17:58:39.197Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T17:59:09.672Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T17:59:40.549Z [MONITOR] ERROR: Caught error
2022-12-12T17:59:40.549Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T18:00:10.514Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T18:00:42.344Z [MONITOR] ERROR: Caught error
2022-12-12T18:00:42.344Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null
2022-12-12T18:01:11.593Z [MONITOR] WARN: Monitor #57 'Mychart': Failing: unable to verify the first certificate | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-12T18:01:44.101Z [MONITOR] ERROR: Caught error
2022-12-12T18:01:44.102Z [MONITOR] ERROR: Cannot read property 'getPeerCertificate' of null

The other log file looks like normal output that I get when running the dev site locally.

@christopherpickering
Copy link
Contributor Author

It's odd the processes are growing as well, but don't show in htop. Do you think an errored process is staying on in the background?

@christopherpickering
Copy link
Contributor Author

Here's the last 100 from the non-error logs:

Worker for job "clear-old-data" online undefined
2022-12-08T03:14:00.468Z [DB] INFO: Data Dir: data/
2022-12-08T03:14:00.679Z [DB] INFO: SQLite config:
{
  name: 'clear-old-data',
  message: 'Clearing Data older than 180 days...'
}
[ { journal_mode: 'wal' } ]
[ { cache_size: -12000 } ]
2022-12-08T03:14:00.682Z [DB] INFO: SQLite Version: 3.38.3
{ name: 'clear-old-data', message: 'done' }
Worker for job "clear-old-data" online undefined
2022-12-09T03:14:00.407Z [DB] INFO: Data Dir: data/
2022-12-09T03:14:00.622Z [DB] INFO: SQLite config:
{
  name: 'clear-old-data',
  message: 'Clearing Data older than 180 days...'
}
[ { journal_mode: 'wal' } ]
[ { cache_size: -12000 } ]
2022-12-09T03:14:00.625Z [DB] INFO: SQLite Version: 3.38.3
{ name: 'clear-old-data', message: 'done' }
2022-12-09T18:58:07.809Z [AUTH] INFO: Login by token. IP=127.0.0.1
2022-12-09T18:58:07.811Z [AUTH] INFO: Username from JWT: admin
2022-12-09T18:58:07.812Z [AUTH] INFO: Successfully logged in user admin. IP=127.0.0.1
2022-12-09T18:58:19.653Z [MONITOR] INFO: Get Monitor Beats: 17 User ID: 1
2022-12-09T19:13:19.912Z [SERVER] INFO: Shutdown requested
2022-12-09T19:13:19.912Z [SERVER] INFO: Called signal: SIGINT
2022-12-09T19:13:19.912Z [SERVER] INFO: Stopping all monitors
Welcome to Uptime Kuma
Your Node.js version: 14
2022-12-09T19:13:59.725Z [SERVER] INFO: Welcome to Uptime Kuma
2022-12-09T19:13:59.726Z [SERVER] INFO: Node Env: production
2022-12-09T19:13:59.726Z [SERVER] INFO: Importing Node libraries
2022-12-09T19:13:59.726Z [SERVER] INFO: Importing 3rd-party libraries
2022-12-09T19:14:14.749Z [SERVER] INFO: Creating express and socket.io instance
2022-12-09T19:14:14.751Z [SERVER] INFO: Server Type: HTTP
2022-12-09T19:14:14.786Z [SERVER] INFO: Importing this project modules
2022-12-09T19:14:15.965Z [NOTIFICATION] INFO: Prepare Notification Providers
2022-12-09T19:14:17.330Z [SERVER] INFO: Version: 1.18.5
2022-12-09T19:14:18.528Z [DB] INFO: Data Dir: ./data/
2022-12-09T19:14:18.528Z [SERVER] INFO: Connecting to the Database
2022-12-09T19:14:20.115Z [DB] INFO: SQLite config:
[ { journal_mode: 'wal' } ]
[ { cache_size: -12000 } ]
2022-12-09T19:14:20.119Z [DB] INFO: SQLite Version: 3.38.3
2022-12-09T19:14:20.119Z [SERVER] INFO: Connected
2022-12-09T19:14:20.120Z [DB] INFO: Your database version: 10
2022-12-09T19:14:20.120Z [DB] INFO: Latest database version: 10
2022-12-09T19:14:20.120Z [DB] INFO: Database patch not needed
2022-12-09T19:14:20.121Z [DB] INFO: Database Patch 2.0 Process
2022-12-09T19:14:20.127Z [SERVER] INFO: Load JWT secret from database.
2022-12-09T19:14:20.128Z [SERVER] INFO: Adding route
2022-12-09T19:14:20.308Z [SERVER] INFO: Adding socket handler
2022-12-09T19:14:20.308Z [SERVER] INFO: Init the server
2022-12-09T19:14:20.314Z [SERVER] INFO: Listening on 3001
2022-12-09T19:14:20.702Z [AUTH] INFO: Login by token. IP=127.0.0.1
2022-12-09T19:14:20.703Z [AUTH] INFO: Username from JWT: admin
2022-12-09T19:14:20.704Z [AUTH] INFO: Successfully logged in user admin. IP=127.0.0.1
Worker for job "clear-old-data" online undefined
2022-12-10T03:14:00.468Z [DB] INFO: Data Dir: data/
2022-12-10T03:14:00.695Z [DB] INFO: SQLite config:
[ { journal_mode: 'wal' } ]
{
  name: 'clear-old-data',
  message: 'Clearing Data older than 180 days...'
}
[ { cache_size: -12000 } ]
2022-12-10T03:14:00.699Z [DB] INFO: SQLite Version: 3.38.3
{ name: 'clear-old-data', message: 'done' }
Worker for job "clear-old-data" online undefined
2022-12-11T03:14:00.461Z [DB] INFO: Data Dir: data/
2022-12-11T03:14:00.692Z [DB] INFO: SQLite config:
{
  name: 'clear-old-data',
  message: 'Clearing Data older than 180 days...'
}
[ { journal_mode: 'wal' } ]
[ { cache_size: -12000 } ]
2022-12-11T03:14:00.696Z [DB] INFO: SQLite Version: 3.38.3
{ name: 'clear-old-data', message: 'done' }
Worker for job "clear-old-data" online undefined
2022-12-12T03:14:00.429Z [DB] INFO: Data Dir: data/
2022-12-12T03:14:00.657Z [DB] INFO: SQLite config:
{
  name: 'clear-old-data',
  message: 'Clearing Data older than 180 days...'
}
[ { journal_mode: 'wal' } ]
[ { cache_size: -12000 } ]
2022-12-12T03:14:00.661Z [DB] INFO: SQLite Version: 3.38.3
{ name: 'clear-old-data', message: 'done' }
2022-12-12T17:21:50.920Z [AUTH] INFO: Login by token. IP=127.0.0.1
2022-12-12T17:21:50.922Z [AUTH] INFO: Username from JWT: admin
2022-12-12T17:21:51.383Z [AUTH] INFO: Successfully logged in user admin. IP=127.0.0.1
2022-12-12T17:22:29.974Z [MONITOR] INFO: Get Monitor Beats: 19 User ID: 1
2022-12-12T17:35:14.621Z [AUTH] INFO: Login by token. IP=127.0.0.1
2022-12-12T17:35:14.622Z [AUTH] INFO: Username from JWT: admin
2022-12-12T17:35:14.623Z [AUTH] INFO: Successfully logged in user admin. IP=127.0.0.1
2022-12-12T17:35:22.449Z [MONITOR] INFO: Get Monitor Beats: 19 User ID: 1

@louislam
Copy link
Owner

Thanks for the info! Hope I can figure out the problem.

It's odd the processes are growing as well

I guess it causes by this bug, I recently fixed: e478084

@louislam
Copy link
Owner

Should be fixed by 466b403 now.

You can test it using master branch.

@christopherpickering
Copy link
Contributor Author

Thanks!! I'll swap to master and see how it goes.

@christopherpickering
Copy link
Contributor Author

I checked back in and am still seeing growth w 1.19 beta.
image

Here are the current error logs:

2022-12-13T14:41:41-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:42:42-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:43:43-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:44:44-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:45:44-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:46:29-06:00 [MONITOR] ERROR: Caught error
2022-12-13T14:46:29-06:00 [MONITOR] ERROR: No socket found
2022-12-13T14:46:45-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:46:55-06:00 [MONITOR] ERROR: Caught error
2022-12-13T14:46:55-06:00 [MONITOR] ERROR: No socket found
2022-12-13T14:47:46-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:48:18-06:00 [MONITOR] WARN: Monitor #30 'Caboodle Console Rel': Failing: Request failed with status code 500 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:48:47-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:49:48-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:50:49-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:51:50-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:52:22-06:00 [MONITOR] WARN: Monitor #4 'Atlas': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:52:30-06:00 [MONITOR] WARN: Monitor #5 'Atlas Test': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:52:51-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:53:52-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:54:53-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:55:54-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:56:54-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:57:56-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:58:56-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T14:59:57-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:00:58-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:01:30-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:01:30-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:01:56-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:01:56-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:01:59-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:03:00-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:03:33-06:00 [MONITOR] WARN: Monitor #30 'Caboodle Console Rel': Failing: Request failed with status code 500 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:04:02-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:05:03-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:06:04-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:07:05-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:07:23-06:00 [MONITOR] WARN: Monitor #4 'Atlas': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:07:31-06:00 [MONITOR] WARN: Monitor #5 'Atlas Test': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:08:06-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:09:07-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:10:08-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:11:08-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:12:09-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:13:10-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:14:11-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:15:12-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:16:13-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:16:31-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:16:31-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:16:57-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:16:57-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:17:14-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:18:15-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:18:48-06:00 [MONITOR] WARN: Monitor #30 'Caboodle Console Rel': Failing: Request failed with status code 500 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:19:16-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:20:17-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:21:18-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:22:19-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:22:25-06:00 [MONITOR] WARN: Monitor #4 'Atlas': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:22:32-06:00 [MONITOR] WARN: Monitor #5 'Atlas Test': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:23:20-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:24:21-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:25:22-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:26:23-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:27:24-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:28:25-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:29:25-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:30:26-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:31:27-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:31:32-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:31:32-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:31:58-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:31:58-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:32:29-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:33:29-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:34:04-06:00 [MONITOR] WARN: Monitor #30 'Caboodle Console Rel': Failing: Request failed with status code 500 | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:34:30-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:35:31-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:36:32-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:37:26-06:00 [MONITOR] WARN: Monitor #4 'Atlas': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:37:33-06:00 [MONITOR] WARN: Monitor #5 'Atlas Test': Failing: incorrect header check | Interval: 900 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:37:34-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:38:34-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:39:35-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:40:36-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:41:37-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:42:38-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:43:39-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:44:32-06:00 [MONITOR] WARN: Monitor #44 'Atlas Demo': Pending: Request failed with status code 502 | Max retries: 10 | Retry: 1 | Retry Interval: 60 seconds | Type: http
2022-12-13T15:44:40-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:45:41-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:46:33-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:46:33-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:46:43-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:46:59-06:00 [MONITOR] ERROR: Caught error
2022-12-13T15:46:59-06:00 [MONITOR] ERROR: No socket found
2022-12-13T15:47:44-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-12-13T15:48:45-06:00 [MONITOR] WARN: Monitor #6 'Atlas Dev': Failing: incorrect header check | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0

Thanks!

@christopherpickering
Copy link
Contributor Author

@louislam also I'm finding that 3e68cf2 broke a few of my site (just found when installing the 19 beta).

I get that "incorrect header check". My header that is failing is

Request:
image

Response:
image

I'm thinking it is because of the "decompress". axios/axios#2406

@louislam
Copy link
Owner

1da00d1 should fix your new issue.

But unfortunately, avoiding getPeerCertificate error didn't solve the issue. At this point, I have no idea where is the memory leak.

@christopherpickering
Copy link
Contributor Author

https://github.com/louislam/uptime-kuma/commit/1da00d19fdd9d2e01114f5b2e463025f0d78dc47 should fix your new issue.

sweet, thanks!

I will try to dig deeper into pm2 to see if there is a way to see what is going on better.

@christopherpickering
Copy link
Contributor Author

1da00d1 should fix your new issue.

@louislam This didn't fix it completely, but if I add in decompress: false where it was decompress: true, it does fix... and my other sites remain up as well.

will this work w/ #2253?

Since the data is not used anywhere (? right?) I didn't add in the 2nd part of this answer (the transform)
axios/axios#2406 (comment)

image

Also, is there any reason not to have br in the encoding list?

@christopherpickering
Copy link
Contributor Author

fyi, I will update node (14 > 18) and pm2 (5.2.0 > 5.2.2) and see if it helps /w the memory.

@louislam
Copy link
Owner

louislam commented Dec 14, 2022

Enable inspector

  1. Edit ecosystem.config.js, add --inspect like this:
module.exports = {
    apps: [{
        name: "uptime-kuma",
        script: "./server/server.js",
        env: {
            NODE_OPTIONS: "--inspect"
        }
    }]
};
  1. Remove previous task and run pm2 start
  2. Use Chrome and go to chrome://inspect
  3. Open dedicated DevTools for Node
  4. Add connection: localhost:9229

You can wait a few days and connect to it.

圖片

@christopherpickering
Copy link
Contributor Author

thanks I will try it!
fyi, when I changed to node 18 a new error came up for .net (only I think) sites running on an old (2008 or 12) IIS server. The issues seems to be w/ the ssl version. There is a command line arg to bypass it... #2295 (comment)

@christopherpickering
Copy link
Contributor Author

christopherpickering commented Dec 14, 2022

for reference, I got it connected to my remote server by adding localhost:9221 in chrome tools, running sudo ufw allow 9229 on the server, and then connecting to the server w/ ssh and port forwarding:
ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no -L 9221:localhost:9229 administrator@server

Starting with NODE_OPTIONS=--openssl-legacy-provider pm2 restart uptime-kuma --update-env --node-args="--inspect" in node 18 seems to set the env var --inspect.
I will check back on it in a few days. Here's where that socket error is coming from:

image

@louislam
Copy link
Owner

This is from my fix: 466b403

It should be caught and safe now. Just an error message.

But I don't know why socket can be undefined here. I have never seen this error in my end.

@christopherpickering
Copy link
Contributor Author

I took a heap snapshot and will take another later on to compare.

Interestingly, I'm just sitting here watching the console lol, and this one popped up a few times.. about 50+ times in 1 second, then stopped. #2346

There were a few different messages:

node:internal/console/constructor:428 Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/home/administrator/websites/kuma/node_modules/knex/lib/client.js:305:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at process.processTimers (node:internal/timers:504:9)
    at async Runner.ensureConnection (/home/administrator/websites/kuma/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/home/administrator/websites/kuma/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:588:22)
    at async RedBeanNode.getRow (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:574:22)
    at async Monitor.calcUptime (/home/administrator/websites/kuma/server/model/monitor.js:919:22)
    at async Monitor.sendUptime (/home/administrator/websites/kuma/server/model/monitor.js:985:24)
    at async Monitor.sendStats (/home/administrator/websites/kuma/server/model/monitor.js:852:13) {
  sql: '\n' +
    '            SELECT\n' +
    '               -- SUM all duration, also trim off the beat out of time window\n' +
    '                SUM(\n' +
    '                    CASE\n' +
    '                        WHEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400 < duration\n' +
    '                        THEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400\n' +
    '                        ELSE duration\n' +
    '                    END\n' +
    '                ) AS total_duration,\n' +
    '\n' +
    '               -- SUM all uptime duration, also trim off the beat out of time window\n' +
    '                SUM(\n' +
    '                    CASE\n' +
    '                        WHEN (status = 1 OR status = 3)\n' +
    '                        THEN\n' +
    '                            CASE\n' +
    '                                WHEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400 < duration\n' +
    '                                    THEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400\n' +
    '                                ELSE duration\n' +
    '                            END\n' +
    '                        END\n' +
    '                ) AS uptime_duration\n' +
    '            FROM heartbeat\n' +
    '            WHERE time > ?\n' +
    '            AND monitor_id = ?\n' +
    '        ',
  bindings: [
    '2022-12-13 15:36:46',
    '2022-12-13 15:36:46',
    '2022-12-13 15:36:46',
    '2022-12-13 15:36:46',
    '2022-12-13 15:36:46',
    31
  ]
}
    at process.<anonymous> (/home/administrator/websites/kuma/server/server.js:1779:13)
    at process.emit (node:events:525:35)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at processTicksAndRejections (node:internal/process/task_queues:96:32)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at process.processTimers (node:internal/timers:504:9)
node:internal/console/constructor:428 Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/home/administrator/websites/kuma/node_modules/knex/lib/client.js:305:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:533:9)
    at process.processTimers (node:internal/timers:507:7)
    at async Runner.ensureConnection (/home/administrator/websites/kuma/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/home/administrator/websites/kuma/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:588:22)
    at async RedBeanNode.getRow (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:574:22)
    at async RedBeanNode.getCell (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:609:19)
    at async Monitor.sendAvgPing (/home/administrator/websites/kuma/server/model/monitor.js:867:32) {
  sql: '\n' +
    '            SELECT AVG(ping)\n' +
    '            FROM heartbeat\n' +
    "            WHERE time > DATETIME('now', ? || ' hours')\n" +
    '            AND ping IS NOT NULL\n' +
    '            AND monitor_id = ?  limit ?',
  bindings: [ -24, 38, 1 ]
}
    at process.<anonymous> (/home/administrator/websites/kuma/server/server.js:1779:13)
    at process.emit (node:events:525:35)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at processTicksAndRejections (node:internal/process/task_queues:96:32)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at listOnTimeout (node:internal/timers:533:9)
    at process.processTimers (node:internal/timers:507:7)
trace @ node:internal/console/constructor:428
(anonymous) @ server.js:1779
emit @ node:events:525
emit @ node:internal/process/promises:149
processPromiseRejections @ node:internal/process/promises:283
processTicksAndRejections @ node:internal/process/task_queues:96
runNextTicks @ node:internal/process/task_queues:64
listOnTimeout @ node:internal/timers:533
processTimers @ node:internal/timers:507
server.js:1781 If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
notify.ts:166 KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (client.js:305:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:533:9)
    at process.processTimers (node:internal/timers:507:7)
    at async Runner.ensureConnection (runner.js:259:28)
    at async Runner.run (runner.js:30:19)
    at async RedBeanNode.normalizeRaw (redbean-node.js:588:22)
    at async RedBeanNode.getRow (redbean-node.js:574:22)
    at async RedBeanNode.getCell (redbean-node.js:609:19)
    at async Monitor.sendAvgPing (monitor.js:867:32)
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/home/administrator/websites/kuma/node_modules/knex/lib/client.js:305:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:533:9)
    at process.processTimers (node:internal/timers:507:7)
    at async Runner.ensureConnection (/home/administrator/websites/kuma/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/home/administrator/websites/kuma/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.storeCore (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:166:26)
    at async RedBeanNode.store (/home/administrator/websites/kuma/node_modules/redbean-node/dist/redbean-node.js:126:20)
    at async beat (/home/administrator/websites/kuma/server/model/monitor.js:724:13)
    at async Timeout.safeBeat [as _onTimeout] (/home/administrator/websites/kuma/server/model/monitor.js:743:17) {
  sql: undefined,
  bindings: undefined
}
    at Timeout.safeBeat [as _onTimeout] (/home/administrator/websites/kuma/server/model/monitor.js:745:25)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:533:9)
    at process.processTimers (node:internal/timers:507:7)

@christopherpickering
Copy link
Contributor Author

I just updated to the latest version.. will check back in a few days. I was testing w/ the flag --max_old_space_size=400 and it didn't help.

@christopherpickering
Copy link
Contributor Author

the --inspect arg keeps disconnecting me so I can't get in to do a new profile :/

On the side, I switched back to node 16 + the latest version 19.3 and still have the memory increase. The cpu around 100% on 1 cpu seems normal. I started the app up directly with node server.js and it starts up around 80% of 1 cpu.. so the issue must be in the memory. After 3-4 days the mem goes from 30mb to 500mb.

Do you know any other tricks to get a snapshot w/out remote inspecting?

@louislam
Copy link
Owner

louislam commented Jan 6, 2023

I still have no idea so far.

If you don't mind, you can send your kuma.db to me. uptime@kuma.pet

You should remove jwtSecret, user password. Mask data that you don't want me to read.

image

@louislam louislam mentioned this issue Feb 23, 2023
2 tasks
@flikites
Copy link

@christopherpickering are you still experiencing memory leaks?

@christopherpickering
Copy link
Contributor Author

Yes, I just started to make it restart nightly. The other option is to use the pm2 auto restart when a memory max is reached. Either should solve this one. Still it is only my intranet monitor, my public ones have no problem. Internal one is also monitoring tcp and sql.

@christopherpickering
Copy link
Contributor Author

I always use the latest release as well 👍🏼 super great tool, I recommend it to many.

@christopherpickering
Copy link
Contributor Author

Another idea I had on this - I'm running lots of jobs on a small server, and some jobs (mssql and ntlm) take more time to complete. I haven't read the code here yet, but is it potentially not a memory leak but a backlog of jobs to do the status check? For example, if the status check doesn't finish all monitors in 60 seconds every time then I will slowly build a backlog of status checks waiting to happen?

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Jul 7, 2023

I haven't read the code here yet, but is it potentially not a memory leak but a backlog of jobs to do the status check?

We should have timeouts in place for this edgecase.
Could you reduce the check frequency of the internal monitors to f.ex. 4 min for a day and check if this could be a cause?

@CommanderStorm
Copy link
Collaborator

Have you tried if 1.22.0 resolves this issue? (the mentioned release includes #3154)

@christopherpickering
Copy link
Contributor Author

@CommanderStorm thanks, I will give both comments a shot, I didn't update to 22 yet but will Monday.

@louislam louislam mentioned this issue Jul 20, 2023
2 tasks
@MetSystem
Copy link

image
Uptime Kuma
version: 1.18.5

@CommanderStorm
Copy link
Collaborator

@MetSystem I don't understand what you are trying to convey.
Before proposing feedback, please update your uptime-kuma version.
I noted above that a PR in 1.22.0 might resolve this issue.

@CommanderStorm
Copy link
Collaborator

I will give both comments a shot, I didn't update to 22 yet but will Monday.

@christopherpickering
what were the results?

@christopherpickering
Copy link
Contributor Author

christopherpickering commented Aug 2, 2023

No luck yet, I forgot to disable my cron restart job lol, sorry for the delay.
Update:
I updated to 1.22.1 and disabled my cron restart job, I will see what happens in a few days before changing the monitor frequency.

@CommanderStorm
Copy link
Collaborator

I will see what happens in a few days before changing the monitor frequency.

@christopherpickering
What were the results? (sorry for the ping)
Can this be closed?

@christopherpickering
Copy link
Contributor Author

Hey, you beat me here today, 😁 I just checked in on the monitors. Since a restart 10 days ago, they have not slowed down. Also, the used memory is approx the same as it was just after a reboot. Prior to the update the memory was constantly growing (I could watch it grow slowly in pm2 monit) but now it is steady. Thanks!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants