Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle connection loss? #18

Open
joypeterson opened this issue Mar 15, 2016 · 9 comments
Open

How to handle connection loss? #18

joypeterson opened this issue Mar 15, 2016 · 9 comments
Assignees

Comments

@joypeterson
Copy link

What is the best way to recover from a connection loss to the RabbitMq server from a client or from a listener? Currently if I lose the connection, then seneca calls root.die which terminates my application. I need to be able to try to re-establish the connection instead of exiting the application.

Thanks

@nfantone
Copy link
Collaborator

This is a good question. Actually, Seneca behaves like that by design. If your microservice breaks in some way, it shouldn't be kept alive. Instead, requests to that service should be redirected to or be responded by other instances while you revive the dead application (something like pm2 or forever would do that for you). That's one way to be tolerant to failure.

If you lose connectivity to the broker, your service becomes virtually useless - if it doesn't, I would argue that the application shouldn't be considered a microservice to begin with. Why would you want to have it around, trying to re-connect, wasting resources?

Nevertheless, having a reconnection strategy and being able to recover could be useful in some scenarios, where your application does, in fact, some other tasks (like answering HTTP requests) and you don't want to let those clients down.

@joypeterson
Copy link
Author

Thank you for the feedback. The service that I am using this for does answer http requests in addition to using seneca-amqp-transport to coordinate with other services. It would be very useful to be able to keep that http endpoint available. It sounds like I might have to implement retry login in all the services that call that http endpoint though unless you have some ideas on how to intercept the connection error from amqplib and re-establish the connection without Seneca shutting down.

Thanks.

@nfantone
Copy link
Collaborator

I'm adding the request label and I'll review it. It may be worthwhile implementing this. In the meantime, I'd strongly suggest that you:

  1. Do not rely on a broken microservice. Just let it die and restart it.
  2. Consider splitting your HTTP endpoints to a separate, independent service/application.
  3. Have redundant microservices in a virtual cluster (again, pm2 or forever can help here) or a distributed one, with some kind of load balancing.

@deedubs
Copy link

deedubs commented Mar 24, 2016

I'd like to vote in favour of the current functionality. IMHO it's always safer to fail fast and recover from a fresh instance than to reconcile once you've re-established connectivity.

@nfantone nfantone self-assigned this Mar 28, 2016
@nfantone
Copy link
Collaborator

Agree with @deedubs.

Still, I'll be seeing what I can do about this.

@mpseidel
Copy link

mpseidel commented Jan 13, 2017

@nfantone any updates on this? I understand the design decision generally but still struggle to decide if a basic recovery from a connection glitch to amqp would be better in our scenario.

We have a api gateway service that accepts http calls and routes messages to our rabbitmq-cluster. If a connection to one rabbit instance goes bad we might be able to quickly fail over to another instance without letting all current requests to the gateway container die.

Any suggestions?

@nfantone
Copy link
Collaborator

nfantone commented Jan 13, 2017

@mpseidel I've been very busy mostly rewriting the entire plugin. Work on #73 is now on develop and soon to be released. It should be now easier to add new stuff. So, I'd like to tackle this and other long pending feature requests next. I apologize for the delay. And, of course, we are always welcoming new PRs.

In the meantime, you should be load balancing requests to your RabbitMQ nodes. This is one the advantages of putting a cluster together. But, from your words, you are already using one.

If a connection to one rabbit instance goes bad we might be able to quickly fail over to another instance without letting all current requests to the gateway container die.

☝️ That is exactly what HA on RabbitMQ clusters are all about.

@mpseidel
Copy link

mpseidel commented Jan 13, 2017

Thanks for the update! Yes we are failing over already but only after a crash of the node process caused by seneca that may cause multiple http requests to die as well.

@brad-decker
Copy link

Maybe unrelated, maybe not. We are working with docker and because depends_on doesn't wait for the container to be running and available we have to rely on 'sleep' to make microservices wait to start up until rabbitmq is theoretically available. Is there a way to gracefully handle retrying the connection for a period of time before killing the service?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants