Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[User Story] Detect server failures and automatically fix them #38

Closed
patrickjaigner opened this issue Jun 17, 2024 · 5 comments
Closed

Comments

@patrickjaigner
Copy link

As a network operator, I would like the server to detect a communication failure on its own so that I can prevent downtime and loss of node messages.

Additional context:
The server was down for four days over the weekend (redeployment on 17.06 fixed the issue)
The server was down for two days over the weekend (redeployment on 27.05 fixed the issue)
Other downtimes 08.04, 11.1, 28.12

@empicano
Copy link
Member

Thanks for opening this issue 👍 Just to be sure, you're describing that the server does not reconnect when an MQTT connection fails, and the HTTP interface is not affected, right?

There's currently no official way to do this, see empicano/aiomqtt#287 for reference. I'll see how to hack around this today.

@patrickjaigner
Copy link
Author

I think so. I didn't explicitly test the HTTP connection. The issue showed over the dashboard which showed no new messages from all sensors for the same timestamp. The dashboard didn't report any issues and showed the "latest" messages. Therefore I would guess that the API is still running.

For quick and dirty fix:
Could the server client subscribe and publish to the same "test" topic? If the message is received the connection to the broker should be up and running. If it's not received you could restart the client. Am I missing something here?

@empicano
Copy link
Member

The problem is actually not to discover a connection failure, we get an exception from aiomqtt in that case. However, we do not have an official way to reconnect in the library for our use case, yet.

I've tested the reconnection by shutting a local broker on and off, should work now 🙂

@patrickjaigner
Copy link
Author

It happened again roughly 1 hour ago.
Around: 2024-06-24T13:03:17.440Z

@empicano
Copy link
Member

The server was still running on 4007fe3, I just pushed and deployed the new version for you 🙂

What I showed you to redeploy on DigitalOcean only restarts the server. To deploy a new version, we have to build and push a new Docker image to the registry. DigitalOcean has a guide with more information on how to do this.

In short, we call ./scripts/build to build the Docker image, and then call docker tag and docker push. When we push a new image to the registry, DigitalOcean automatically deploys it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants