Distributed mode #1259

mabed-fr · 2022-02-04T23:03:48Z

⚠️ Please verify that this feature request has NOT been suggested before.

I checked and didn't find similar feature request

🏷️ Feature Request Type

New Monitor

🔖 Feature description

Is it possible to have several instances of uptime-kuma controlled by a central point?
In distributed mode?
Connected by wirguard ?

Regards,

✔️ Solution

Is it possible to have several instances of uptime-kuma controlled by a central point?
In distributed mode?
Connected by wirguard ?

❓ Alternatives

No response

📝 Additional Context

Congratulations for this project that I will support if one of my skills can help you.

mamiu · 2022-02-08T04:14:12Z

I like the idea of a "distributed mode" or HA mode (high availability mode), multi instance mode, multi hosts mode, fail safe mode, etc. (a few keywords so that this ticket can be found easily).

But what is "wirguard"? If you mean wireguard: You don't need a VPN tunnel to achieve something like that. Additional instances could be added via a private token (similar to how nodes are added to a Kubernetes cluster).

adyanth · 2022-03-05T20:04:46Z

A distributed install definitely makes sense for something that monitors uptime for other software. Would not want it to go down along with the other apps.

jesse2point0 · 2022-05-26T06:05:26Z

I would love this as a feature, if I could have a small instance running on a site and relaying to a master instance somewhere.

For example, say you are an MSP, and you have a few line of business applications you want to monitor inside the network of reach customer without exposing the endpoints directly or vpns. Then the local instance relays or reports the stats to a central instance. Each client site may have and internal status page, but the MSP could have those status pages published centrally for all sites and customers.

onedr0p · 2022-08-07T23:02:54Z

It's kind of strange that an application to monitor other applications wouldn't support running in high availability but maybe that's not part of the scope of this project. Uptime Kuma would need to support a external DB for data and something like redis for session cache. Also I'm not aware if uptime kuma writes anything else to disk but if so that would be to be changed as well to run HA.

mabed-fr · 2022-08-12T04:11:27Z

The project is brand new compared to what is on the market, it takes time to develop.

the main idea for my part was to have satellites in several countries but the HA is also possible.

If you want this functionality do not hesitate to comment.

officiallymarky · 2022-09-06T17:50:41Z

Yes!

snth · 2022-11-10T12:26:25Z

We just started using uptime-kuma and it's awesome! Thank you so much for creating this and making it available!

Like many others in this thread, the thought naturally arose of "who will watch the watchers"? A distributed/high-availability configuration would be the Bee's Knees.

Until then we're thinking about having uptime-kuma monitored by BetterUptime or healthchecks.io, which given that it's a single service should fit in the free tier.

MaxamServices · 2022-12-12T17:59:53Z

This would be awesome! and if it would be possible to make the nodes agree certain instance is down and then send the notification

cheuklam · 2022-12-18T02:33:15Z

It would be great! And if possible, better create a config that allow the notification to be sent "if 2/3 of the depoloyment detail downtime".

I just got a case yesterday that the kuma non stop sending notification (timeout every minute), but when I access the application (which host on AWS and has CloudWatch), it is completly fine. I guess there is some routing issue in between. Only 2 out of 50 application monitored by kuma has such issue.....but then it keep me awaking since 5am in the morning....

wokawoka · 2023-03-23T00:55:54Z

I agree, it would be great

Computroniks · 2023-03-23T07:31:02Z

Just going to link #84 here as it looks similar

simcmoi · 2023-10-21T21:47:20Z

it would be great. I have 1 server and 1 nas. If i can install 2 uptime in HA it will be awesome !

cheuklam · 2023-10-22T03:03:33Z

Distributed across avaliable zone maybe a difficult task, but I think we can do it in a simple way.
My request for distributed mode is of 3 reasons:

current uptime kuma (UK) node is down, it will think all my monitoring site was down and up again when the UK service back online, which didn't looks good; It is not the site down but the UK service down
We wanna use UK becoz we wanna ensure every service is up and running, and we will have emergency plan for such cases. When the service is down itself, our alert is gone. We can do HA / MultiAZ avaliable for the website but not for the monitoring service, which is a bit weird I would said
Network issue which makes the site down in part of the world. Sometimes due to CDN service of network operator, the site maybe avaliable at Europe but stopped working at US. Personally I run some lowest level VM on cloud in different region (using free tier) to check such cases.

We can fix the above issue running multiple instance on different server, but the data is not united. That's why I am thinking of the following suggestion, which should be very simple to implement and fix all the above issue:

Multi instance data sync
Steps
a) Allow an instance name for each instance
b) add one more column to the report table, besides the status changes, mark down which instance its from
c) In the notification channel, add one more option which is "Uptime Kuma", so we can sent the status changes to other UK instance.
d) When the service is up, check with other "allied instance" and migrate the missing data if there is any. (Not very important but good to have)
Client only deployment
a tiny nodejs / python piece of code, that will ask the primary UK instance for the list to check, and return the result. We can run this piece of code on Lamda / Function based cloud service or docker, so we can just deploy in a very low cost / no cost to address Issue 3) I mentioned above

P.S. HA mode sounds fancy but it is hard to do HA across multiAZ without a lot of virtual IP, SDWAN which involved a lot of Infra thing. I think the method I mentioned above can minimze the dependency on network infra yet fixed the issue I listed. HA setup only means to keep the servie up and running, I dont think we need ot make things too complicated as DB cluster and heartbeat service together will already be more complicated then the whole project. I like UK for the simplicity yet achieve the purpose.

officiallymarky · 2023-10-22T03:05:00Z

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

cheuklam · 2023-10-22T03:08:36Z

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

This is not difficult but also not HA, once the main service is down, all client have no where to report. But as I mentioned in solution point 2, it did solved some other issue.

snth · 2023-10-23T15:39:29Z

I would also really like this feature because I just had my node with Uptime on it go down the other day and while most of my things don't require HA, it would be good to have that in a monitoring solution.

I don't know much high availability setups or Uptime's internal architecture but can't you push the difficult distributed consensus problem into some other component? For example whatever your underlying storage layer is, for things like Redis, Postgres, SQLite, ... there are usually already high availability solutions available so can't you perhaps leverage that?

snth · 2023-11-10T14:25:18Z

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Run uptime-kuma in Docker Swarm mode with replicas set to 1.
Create a shared docker volume with GlusterFS.

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

CommanderStorm · 2023-11-10T14:29:15Z

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage

Actually, v2 does support (external+internal) mariadb next to sqlite and therefore also more complex setups like mariadb-galera see the progress here: https://github.com/louislam/uptime-kuma/milestone/24

For Postgres as a data backend see #959

snth · 2023-11-10T14:58:11Z

Thanks @CommanderStorm . That's great to hear.

Where can I read more about the sqlite setup? Is the connection string for that configurable because then I could probably just use Dqlite for the backend. That would be great because I would really like to avoid the GlusterFS route if possible.

CommanderStorm · 2023-11-10T15:36:42Z

I don't know what you need. The sqlite database is stored at db/kuma.db.
SQLite does not really have a connection string I know of... you just point at the file and go..

We have never looked into if dqlite is a possibility or if this should be a thing we should support (currently, I would argue that mariadb is enough, but I am not a maintainer)
=> currently not officially supported
=> we won't consider changes to this part of the system breaking

Here is our contribution guide https://github.com/louislam/uptime-kuma/blob/5b6522a54edad9737fccf195f9eaa25c6fb9d0f6/CONTRIBUTING.md

officiallymarky · 2023-11-10T22:13:24Z

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Run uptime-kuma in Docker Swarm mode with replicas set to 1.

Create a shared docker volume with GlusterFS.

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

Unless it is located geographically on a different Internet connection it really doesn’t improve the situation much.

CommanderStorm · 2024-03-18T11:28:01Z

There have not been any news in the last four months.
We are still working out the kinks of V2.0

JaneX8 · 2024-05-05T19:09:48Z

I would love to see this feature. It would be great if multiple nodes of Uptime Kuma can be linked. And that for each check you add there is an option to select which nodes this check should run on. And also use it as a fail condition. As in "report if all fail", "report if N fails". A syncing of tasks would be better, because this way each node can keep running in standalone mode if another is down. Which makes it kind of a distributed network of individual instances that can work standalone as well as cooperate, rather than for example workers that still depends on a master to be online.

This way I would add Uptime Kuma on many of my geographically separated servers and simply make sure my checks work on all of them, without having to configure many different individual instances.

CommanderStorm · 2024-05-05T21:05:09Z

@JaneX8
You can subscribe to #84 for updates.
Currently, our priorities are on different items such as #4500 and refactoring the monitoring items for better maintainability.

pareis · 2024-10-06T07:37:57Z

I think linking 2 nodes might not be sufficient, typically, 3 nodes are required for 2 remaining nodes to be able to figure out which node is disconnected and which are still "live". It's like distributed systems work if you want it reliable.

I've been thinking maybe we don't need this distributed mode in uptime-kuma itself, the same can be achieved by let's say running 2 kumas in different regions with the same checks, both alerting via webhooks or similar into an alerting tool that is able to combine the different state using an OR or an AND operation. Like: source A says down, source B says up => up (depending on what you want). A sustained source B down could still be used to trigger a slower alert. This would be more in the context of a on-call system where such system is in use. In the hobbyist space where we use kuma to send alerts via email for example, this wouldn't be possible easily.

officiallymarky · 2024-10-06T12:41:17Z

Ideally it would allow n nodes, but it's clear from the comments that this isn't a feature that will be added.

mabed-fr added the feature-request Request for new features to be added label Feb 4, 2022

Computroniks mentioned this issue Feb 13, 2023

Kuma On Site Agent #2766

Closed

1 task

louislam closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2023

This comment was marked as spam.

Sign in to view

CommanderStorm added area:deployment related to how uptime kuma can be deployed area:core issues describing changes to the core of uptime kuma labels Apr 21, 2024

JaneX8 mentioned this issue May 6, 2024

Remote Executors #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed mode #1259

Distributed mode #1259

mabed-fr commented Feb 4, 2022

mamiu commented Feb 8, 2022 •

edited

Loading

adyanth commented Mar 5, 2022

jesse2point0 commented May 26, 2022 •

edited

Loading

onedr0p commented Aug 7, 2022 •

edited

Loading

mabed-fr commented Aug 12, 2022

officiallymarky commented Sep 6, 2022

snth commented Nov 10, 2022

MaxamServices commented Dec 12, 2022

cheuklam commented Dec 18, 2022

wokawoka commented Mar 23, 2023

Computroniks commented Mar 23, 2023

simcmoi commented Oct 21, 2023

cheuklam commented Oct 22, 2023 •

edited

Loading

officiallymarky commented Oct 22, 2023

cheuklam commented Oct 22, 2023

snth commented Oct 23, 2023

snth commented Nov 10, 2023 •

edited

Loading

CommanderStorm commented Nov 10, 2023 •

edited

Loading

snth commented Nov 10, 2023

CommanderStorm commented Nov 10, 2023

officiallymarky commented Nov 10, 2023

This comment was marked as spam.

CommanderStorm commented Mar 18, 2024

JaneX8 commented May 5, 2024

CommanderStorm commented May 5, 2024

pareis commented Oct 6, 2024

officiallymarky commented Oct 6, 2024 •

edited by CommanderStorm

Loading

Distributed mode #1259

Distributed mode #1259

Comments

mabed-fr commented Feb 4, 2022

⚠️ Please verify that this feature request has NOT been suggested before.

🏷️ Feature Request Type

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

mamiu commented Feb 8, 2022 • edited Loading

adyanth commented Mar 5, 2022

jesse2point0 commented May 26, 2022 • edited Loading

onedr0p commented Aug 7, 2022 • edited Loading

mabed-fr commented Aug 12, 2022

officiallymarky commented Sep 6, 2022

snth commented Nov 10, 2022

MaxamServices commented Dec 12, 2022

cheuklam commented Dec 18, 2022

wokawoka commented Mar 23, 2023

Computroniks commented Mar 23, 2023

simcmoi commented Oct 21, 2023

cheuklam commented Oct 22, 2023 • edited Loading

officiallymarky commented Oct 22, 2023

cheuklam commented Oct 22, 2023

snth commented Oct 23, 2023

snth commented Nov 10, 2023 • edited Loading

CommanderStorm commented Nov 10, 2023 • edited Loading

snth commented Nov 10, 2023

CommanderStorm commented Nov 10, 2023

officiallymarky commented Nov 10, 2023

This comment was marked as spam.

CommanderStorm commented Mar 18, 2024

JaneX8 commented May 5, 2024

CommanderStorm commented May 5, 2024

pareis commented Oct 6, 2024

officiallymarky commented Oct 6, 2024 • edited by CommanderStorm Loading

mamiu commented Feb 8, 2022 •

edited

Loading

jesse2point0 commented May 26, 2022 •

edited

Loading

onedr0p commented Aug 7, 2022 •

edited

Loading

cheuklam commented Oct 22, 2023 •

edited

Loading

snth commented Nov 10, 2023 •

edited

Loading

CommanderStorm commented Nov 10, 2023 •

edited

Loading

officiallymarky commented Oct 6, 2024 •

edited by CommanderStorm

Loading