Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downtime while performing upgrade #35

Closed
gaieges opened this issue Feb 22, 2017 · 10 comments
Closed

Downtime while performing upgrade #35

gaieges opened this issue Feb 22, 2017 · 10 comments

Comments

@gaieges
Copy link

gaieges commented Feb 22, 2017

Problem

Attempts to deploy via the following method results in a period of time where the load balancer throws 500 errors, as a result of not being able to connect to hosts that were shut off (after new containers have already been created)

My deploy approach:

rancher-compose up --force-upgrade --pull -d

With the following rancher-compose.yml settings:

version: '2'
services:
  myapp:
    scale: 2
    upgrade_strategy:
      start_first: true

Expected behavior

During the entire time that the nodes are doing their rolling deploy, I should get no 500 errors.

Solution

Not sure where the problem is here, but I believe it's that rancher-traefik doesnt watch for events in real time, and regenerate the config on that event.

I say this, as I've noticed that after a few minutes, and the 'reload' command is issued and you see the change in the logs, the service starts working fine again.

Willing to make a PR but want to ensure that I have the design of this right.

@kelchm
Copy link

kelchm commented Feb 24, 2017

Very interested in this as well. Let me know if there is anything I can do to help with testing.

@gaieges
Copy link
Author

gaieges commented Feb 24, 2017

Chatted with @rawmind0 separately about this, he mentioned that this would essentially be handled in a first class way by the traefik guys: traefik/traefik#1173

I'm likely to take that approach instead of putting effort in here.

@kelchm
Copy link

kelchm commented Feb 27, 2017

Thanks for the link @gaieges, the Rancher provider for traefik looks very promising.

@rawmind0
Copy link
Owner

@gaieges thanks to share the info in the thread.

I was talking with traefik team and may be this week traefik v1.2.0-rc2 would be released, including rancher backend. Asap i'll test it and i'll update the catalog package accordingly.

Best regards....

@miguelpeixe
Copy link

Any update on this matter? I'm still getting downtime while performing start before stopping upgrades. This is also an issue with rancher's haproxy load balancer.

I was hoping that traefik could be the alternative.

@gaieges
Copy link
Author

gaieges commented May 1, 2017

I don't think @rawmind0 is going to put any more effort into this project (but I'll let him speak to that) in favor of the traefik solution they've recently implemented.

I've been trying to use the native traefik support for rancher, and while it works, it's not entirely there yet so I haven't flipped over to using that in production yet. Either way - probably best to look towards using the native traefik approach vs this.

@kelchm
Copy link

kelchm commented May 1, 2017

@miguelpeixe, I've made some improvements to the rancher provider in Traefik to improve the situation. While this does not give you true zero downtime deployments out of the box, it's a big improvement.

I think the best way to do blue green deployments with Rancher and Traefik is to actually spin up a new stack rather than upgrading the existing stack.

This allows Traefik to evaluate the health of the stack as a whole and only cut over traffic to the new stack once the entire stack is healthy. I'm working on a complete writeup of how I've been using this, but feel free to reach out to me if you have any questions.

@miguelpeixe
Copy link

thanks for the updates and tips @gaieges @kelchm, I'll work on setting up native traefik and temporarily use new stack for upgrades.

@rayout
Copy link

rayout commented Oct 6, 2017

Any news for this?

@rawmind0
Copy link
Owner

rawmind0 commented Nov 7, 2017

Hi all,

From alpine-traefik release 1.4.0-3, traefik built in rancher integration is supported, metadata and api. Also, community-catalog is already updated. Now 3 rancher integration are available, metadata, api ( traefik built in) or external (rancher-traefik).

Take into account that labels are different with traefik built in integration, https://docs.traefik.io/configuration/backends/rancher/#labels-overriding-default-behaviour
Metadata with longpoll is the prefered integration, it’s working so good. :)

Also, I made a PR that is merged and will be included in next traefik release with a refactor of rancher integration. traefik/traefik#2291

Best regards...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants