Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment considerations during windows update #40

Open
davebally opened this issue Jan 21, 2020 · 3 comments
Open

Environment considerations during windows update #40

davebally opened this issue Jan 21, 2020 · 3 comments

Comments

@davebally
Copy link

Hi there,

We are starting to look at POA for our env and wondering how other people cope with making changes to load balancers especially to prevent any hiccups there.

Prior to a reboot of a node it would make send to have a hook to inform the load balancer that the node is going out of load and then add it back in when the service fabric services have restarted.

Also how do people cope installing other updates ( dot net core X runtime) that may or may not require a reboot of the node ?

Can you share any thoughts ?

Dave

@khandelwalbrijesh
Copy link
Contributor

POA installs all the updates which comes along with OS updates if 'InstallWindowsOSOnlyUpdates' is set to false(which is by default for versions greater than 1.3.1).
@masnider is there any official guidance available for Non-OS updates?

@Mahons
Copy link

Mahons commented Jun 10, 2020

Hi @davebally,

We use two load balancers in our environment. One at L7 (F5) which is simply a passthrough to our downstream load balancer at L4 (Traefik) hosted within the cluster. The F5 checks the health of the nodes themselves while Traefik has the built-in capability to perform health checks at the SF application level. Between the two of them, we can safely restart nodes with minimal impact.

We potentially could have performed application health checks on our F5 but it would have been a completely custom solution that I didn't want to have to support long term.

Best,

Stephen

@Jaans
Copy link

Jaans commented Jun 10, 2020

We use Azure public load balancer to pass onto Traefik reverse proxy deployed on each node.

We configure backend pool health (so the Azure load balancer knows whether the node is up or not) to point the Traefik "ping" service.

The Traefik configuration is dynamic and based on Service Fabric config together with SF notification events. This deals with SF Cluster level availability.

At the application level we use the default retry and backoff strategy for transient failures.

All of these things gives us a great deal of reliability and elasticity within the cluster and OS updates of "upgrade domain" nodes via POA becomes a trivial part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants