Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start monitor with KnexTimeoutError: Knex: Timeout acquiring a connection. #4153

Closed
2 tasks done
lnagel opened this issue Dec 2, 2023 · 13 comments
Closed
2 tasks done
Labels
area:core issues describing changes to the core of uptime kuma help question Further information is requested

Comments

@lnagel
Copy link

lnagel commented Dec 2, 2023

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

Startup crash with Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

👟 Reproduction steps

Start container with Uptime Kuma, log in and wait for data to load on the dashboard. Check logs for errors.

👀 Expected behavior

Dashboard would load, monitor checks would be run, log not full of errors.

😓 Actual Behavior

Dashboards do not load any data, monitor checks are not being run, log is full of errors.

🐻 Uptime-Kuma Version

1.23.7

💻 Operating System and Arch

MikroTik RouterOS 7.11.2

🌐 Browser

Firefox latest

🐋 Docker Version

No response

🟩 NodeJS Version

18.18.2

📝 Relevant log output

Welcome to Uptime Kuma 
Your Node.js version: 18.18.2 
2023-12-02T07:28:50Z [SERVER] INFO: Welcome to Uptime Kuma 
2023-12-02T07:28:50Z [SERVER] INFO: Node Env: production 
2023-12-02T07:28:50Z [SERVER] INFO: Inside Container: true 
2023-12-02T07:28:50Z [SERVER] INFO: Importing Node libraries 
2023-12-02T07:28:50Z [SERVER] INFO: Importing 3rd-party libraries 
2023-12-02T07:29:55Z [SERVER] INFO: Creating express and socket.io instance 
2023-12-02T07:29:55Z [SERVER] INFO: Server Type: HTTP 
2023-12-02T07:29:55Z [SERVER] INFO: Importing this project modules 
2023-12-02T07:29:56Z [NOTIFICATION] INFO: Prepare Notification Providers 
2023-12-02T07:29:56Z [SERVER] INFO: Version: 1.23.7 
2023-12-02T07:29:56Z [DB] INFO: Data Dir: ./data/ 
2023-12-02T07:29:56Z [SERVER] INFO: Connecting to the Database 
2023-12-02T07:29:57Z [DB] INFO: SQLite config: 
[ { journal_mode: 'wal' } ] 
[ { cache_size: -12000 } ] 
2023-12-02T07:29:57Z [DB] INFO: SQLite Version: 3.41.1 
2023-12-02T07:29:57Z [SERVER] INFO: Connected 
2023-12-02T07:29:57Z [DB] INFO: Your database version: 10 
2023-12-02T07:29:57Z [DB] INFO: Latest database version: 10 
2023-12-02T07:29:57Z [DB] INFO: Database patch not needed 
2023-12-02T07:29:57Z [DB] INFO: Database Patch 2.0 Process 
2023-12-02T07:29:57Z [SERVER] INFO: Load JWT secret from database. 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Adding route 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Adding socket handler 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Init the server 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Listening on 3001 
2023-12-02T09:29:57+02:00 [SERVICES] INFO: Starting nscd 
2023-12-02T09:30:07+02:00 [AUTH] INFO: Login by token. IP=10.20.0.20 
2023-12-02T09:30:07+02:00 [AUTH] INFO: Username from JWT: lenno 
2023-12-02T09:30:07+02:00 [AUTH] INFO: Successfully logged in user lenno. IP=10.20.0.20 
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? 
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26) 
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28) 
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) 
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22) 
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22) 
    at async Monitor.getParent (/app/server/model/monitor.js:1587:16) 
    at async Monitor.isParentActive (/app/server/model/monitor.js:1670:24) 
    at async Monitor.isParentActive (/app/server/model/monitor.js:1676:30) 
    at async Proxy.toJSON (/app/server/model/monitor.js:106:29) 
    at async UptimeKumaServer.getMonitorJSONList (/app/server/uptime-kuma-server.js:150:34) { 
  sql: '\n' + 
    '            SELECT parent.* FROM monitor parent\n' + 
    '    \t\tLEFT JOIN monitor child\n' + 
    '    \t\t\tON child.parent = parent.id\n' + 
    '            WHERE child.id = ?\n' + 
    '        ', 
  bindings: [ 64 ] 
} 
    at process.unexpectedErrorHandler (/app/server/server.js:1895:13) 
    at process.emit (node:events:517:28) 
    at emit (node:internal/process/promises:149:20) 
    at processPromiseRejections (node:internal/process/promises:283:27) 
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32) 
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues 
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? 
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26) 
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28) 
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) 
    at async RedBeanNode.findOne (/app/node_modules/redbean-node/dist/redbean-node.js:499:19) 
    at async beat (/app/server/model/monitor.js:337:32) 
    at async safeBeat (/app/server/model/monitor.js:1026:17) { 
  sql: undefined, 
  bindings: undefined 
} 
    at safeBeat (/app/server/model/monitor.js:1028:25) 
2023-12-02T09:32:07+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues 

///--cut--//

2023-12-02T09:32:37+02:00 [MONITOR] ERROR: Please  SUM all duration, also trim off the beat out of time windowreport to https://github.com/louislam/uptime-kuma/issues 
2023-12-02T09:32:37+02:00 [MONITOR] INFO: Try to restart the monitor 
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? 
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26) 
    at runNextTicks (node:internal/process/task_queues:60:5) 
    at process.processTimers (node:internal/timers:509:9) 
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28) 
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) 
    at async RedBeanNode.findOne (/app/node_modules/redbean-node/dist/redbean-node.js:499:19) 
    at async beat (/app/server/model/monitor.js:337:32) 
    at async safeBeat (/app/server/model/monitor.js:1026:17) { 
  sql: undefined, 
  bindings: undefined 
} 
    at safeBeat (/app/server/model/monitor.js:1028:25) 
    at runNextTicks (node:internal/process/task_queues:60:5) 
    at process.processTimers (node:internal/timers:509:9) 
2023-12-02T09:32:37+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues 
2023-12-02T09:32:37+02:00 [MONITOR] INFO: Try to restart the monitor 
@lnagel lnagel added the bug Something isn't working label Dec 2, 2023
@louislam louislam added help and removed bug Something isn't working labels Dec 2, 2023
@chakflying
Copy link
Collaborator

Considering you are using a router, it's likely you are using a USB attached storage? It's likely that the database has gotten too large that the increased IO latency is too much for the application to handle.

@lnagel
Copy link
Author

lnagel commented Dec 2, 2023

Considering you are using a router, it's likely you are using a USB attached storage? It's likely that the database has gotten too large that the increased IO latency is too much for the application to handle.

It was running USB attached storage, but it's a fast drive (DataTraveler Max USB 3.2 Gen 2), it wasn't even utilizing the disk too much.

I moved the container with its data folder to a 8th gen Intel NUC running the latest Docker on a dedicated NVME drive hosting the docker filesystem. The issue persists unfortunately.

Is there any advice how to reset the app's accumulated data history without losing all 89 configured monitors? I am mostly using HTTP, PING and MQTT checks.

@chakflying
Copy link
Collaborator

That's very strange.

If you have worked with databases before, you can stop the application, and open the database file data.db in any SQLite compatible application (dbeaver, beekeeper, etc). You can then manually delete old rows in the heartbeat table.

@ilogus
Copy link

ilogus commented Dec 3, 2023

Hello, I seem to have a similar problem.

This morning, without having received any particular notification, I saw that all the monitors had a problem during the night, in the docker logs, the line Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? is present on all monitors.

I'm on a VPS running AlmaLinux 8.9 (Midnight Oncilla) with a 4.18.0-513.5.1.el8_9.x86_64 kernel.
Uptime kuma version: 1.23.8 and Docker version 24.0.7, build afdd53b

Regards,

@lnagel
Copy link
Author

lnagel commented Dec 3, 2023

That's very strange.

If you have worked with databases before, you can stop the application, and open the database file data.db in any SQLite compatible application (dbeaver, beekeeper, etc). You can then manually delete old rows in the heartbeat table.

Thanks. I did the following which reduced the size of the database from 297M down to 252K.

root@nuc:/var/lib/docker/volumes/uptime-kuma_data/_data# sqlite3 kuma.db 
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> delete from heartbeat;
sqlite> vacuum;
sqlite> 

At least uptime-kuma will start up now, no errors in the logs so far, let's see......

Changed "Keep monitor history data" from 180 days to 7 days. Still, I installed it about 3 weeks ago at most.

@gering
Copy link

gering commented Dec 30, 2023

I see a very similar error, running in docker on my Synology NAS:

2023-12-30T12:49:44+01:00 [MONITOR] INFO: Try to restart the monitor
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:593:19)
    at async Settings.get (/app/server/settings.js:54:21)
    at async exports.setting (/app/server/util-server.js:610:12)
    at async /app/server/server.js:199:13 {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.unexpectedErrorHandler (/app/server/server.js:1899:13)
    at process.emit (node:events:517:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues

@sethvoltz
Copy link

sethvoltz commented Feb 12, 2024

Piling on: I have also been seeing this issue for over a year when I am moving around the app a lot (looking across monitors, editing the dashboards) but it usually self-resolves after a couple minutes of letting things catch up. Today it has not resolved and lead me to this issue.

Edit 1:
Adding current DB size stats via the backups before clearing out the heartbeat table as suggested by @lnagel

507M Feb 11 21:55 kuma.db
 60K Oct 17  2021 kuma.db.bak0
6.6M Oct 26  2021 kuma.db.bak20211026165215
 10M Oct 31  2021 kuma.db.bak20211031220900
115M Jan  3  2022 kuma.db.bak20220103141319
211M May 27  2022 kuma.db.bak20220527215144
211M Jul  9  2022 kuma.db.bak20220709110215
211M Sep 11  2022 kuma.db.bak20220911174428
410M Dec 31  2022 kuma.db.bak20221231132150
452M Feb 18  2023 kuma.db.bak20230218182325
484M Mar 22  2023 kuma.db.bak20230322220702

Edit 2:
The truncate was taking longer than expected so I killed it and did a select count(*) from heartbeat; and there were still 3.23M records in the table. I have kicked it off again but that does seem awfully high for 12 monitors with 2-5m checks.

Edit 3:
Yep, the truncate worked and everything about the app is snappy again. Post-vacuum the table is also much more reasonable:

568K Feb 11 22:33 kuma.db

@chakflying
Copy link
Collaborator

We implemented incremental_vacuum in 1.23 which should have mitigated this issue. Can you check if you are running the latest version, and if so, are there items in the logs that indicate the incremental_vacuum task has failed?

@CommanderStorm CommanderStorm added question Further information is requested area:core issues describing changes to the core of uptime kuma labels Feb 12, 2024
@sethvoltz
Copy link

@chakflying just confirmed, 1.23.11. I have docker image update monitors, so I would have updated within days to a week from release.

@sethvoltz
Copy link

I think the problem is less about vacuum and more that the heartbeat table was huge at >3 million records for a small number of monitors.

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Feb 12, 2024

What is your retention time in the settings set to?

@sethvoltz
Copy link

sethvoltz commented Feb 12, 2024

@CommanderStorm Took me a few to find that as I'd never changed it. It was set to the default (I assume) of 180 days. I set it to 14 days just now to hopefully avoid this issue again

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Feb 12, 2024

Okay. I am assuming your issue is the same @lnagel
We know that lowering retention is not a good solution for the long term, but for 1.23.X that is everything that we can offer as a quick "remedy".

A lot of performance improvements (using aggregated vs non-aggregated tables to store heartbeats, enabling users to choose mariadb as a db-backend, pagination of important events) have been made in v2.0 (our next release) resolving™️ this problem-area.
=> I'm going to close this issue

You can subscribe to our releases and get notified when a new release (such as v2.0-beta.0) gets made.
See #4171 for the bugs that need addressing before that can happen.

Meanwhile (the issue is with SQLite not reading data fast enough to keep up):

  • limit how much retention you have configured
  • limit yourself to a reasonable amount of monitors (hardware-dependant, no good measure)
  • don't run on slow disks or disk with high latency like HDDs, SD-cards, USB-Stick attached to a router, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core issues describing changes to the core of uptime kuma help question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants