-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve rancher provider handling of service and container health states #1343
Conversation
LGTM, but is there a way to cover this case with a unit test? |
@errm, which case specifically? For containers without a health check I need to actually test the behavior of the Rancher API. |
@kelchm with your PR, |
Yeah I would be fine with testing |
Hey @kelchm, as mentioned already, I like the idea where this PR is going to :) I'm also fine with migrating checks from the service to a container. However, as mentioned twice already, we need some unit test to cover it ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test :)
I'm going to need to rework some of my original plan here -- it turns out that the Rancher API returns a My thinking is that we will likely need to use a combination of factors to determine first if a service should be included in the config and then second if a given container should be included for that service. |
I think this is close to fixing what I see as the root cause of the issue. I think we need to add a check on the container's State (not just HealthState). I would say we'd only want to include containers that have a HealthState of "healthy" and a State of "running". I don't think I'd include containers that are being upgraded or initializing into the LB until they reach a healthy state, and as you mentioned before a healthy HealthState alone isn't enough to determine if a container should be included in the load balancer. From my investigating the API and my use cases I think those are all the criteria that are needed if using the API (the metadata service deals with things a little differently, but this isn't using that) |
@SantoDE, I think this is ready for your review. I think the only thing remaining are some minor updates to the documentation. I've tried to improve the test coverage, hopefully this is okay for now. |
This looks good to me, haven't been able to test it yet personally but this looks like it'd resolve the issues I'm seeing! |
I've had a quick test this afternoon and it LGTM. A well-needed enhancement, thanks @kelchm. Now it could be my use case bias, but this seems like it should be default functionality - I would say |
@martinbaillie, containers which have a Can you grab the API response for the container(s) you are seeing this behavior on? I'd be curious what the |
@kelchm sorry I may have confused things. I meant it works perfectly with your PR enabled. I was questioning whether it should default to true rather than false by default, because without your PR enabled I saw stopped containers appear in the backend list which is undesirable default behaviour. |
@martinbaillie, There are two types of filters, containerFilter which filters at the container level and serviceFilter which filters at the service level. I have |
Thanks @kelchm that explains it. I was struggling to come up with a use case for when this would be desirable behaviour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kelchm
Thanks a lot for your work! I really like it :)
Beside the minor typo, it's a LGTM to me. Could you please rebase and squash your commits?
Thanks!
provider/rancher.go
Outdated
if service.Health != "" && service.Health != "healthy" { | ||
log.Debugf("Filtering unhealthy or starting service %s", service.Name) | ||
return false | ||
// Only filter services by Health (HealthState) and State is EnableServiceHealthFilter is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo I guess?
if EnableServiceHealthFilter is true... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye @SantoDE, fixed!
bae596b
to
efe1f62
Compare
Hey @kelchm , a) tests are failing :'( Thanks for your work! :) |
@kelchm Could you squash your commits? Thanks. |
7df3398
to
f99e06f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the traefik.sample.toml
? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kelchm
Great!
Could you also complete traefik.sample.toml
?
provider/rancher/rancher.go
Outdated
if key == "io.rancher.stack_service.name" && value == rancherData.Name { | ||
rancherData.Containers = append(rancherData.Containers, container.PrimaryIpAddress) | ||
} | ||
if containerFilter(container) && container.Labels["io.rancher.stack_service.name"] == rancherData.Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be even better to write if container.Labels["io.rancher.stack_service.name"] == rancherData.Name && containerFilter(container) {
, but meh :)
53df90a
to
f8e3ca2
Compare
@emilevauge, @ldez done 👍 |
traefik.sample.toml
Outdated
# | ||
# Optional | ||
# | ||
RefreshSeconds = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
traefik.sample.toml
Outdated
# Optional | ||
# Default: false | ||
# | ||
EnableServiceHealthFilter = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
- Improves default filtering behavior to filter by container health/healthState - Optionally allows filtering by service health/healthState - Allows configuration of refresh interval
ab805a8
to
44db6e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kelchm
LGTM
This pull request aims to fix some basic issues which impact the usability of the rancher provider. During my testing with the rancher provider I found that services which did not have a healthy state were removed from the traefik config. In practice we should not care about the health of a service, but only about the health of that services individual containers. As a result I've made the following changes:
HealthState
. A modified version of this functionality may be enabled using the 'EnableServiceHealthFilter' configuration toggle. This is particularly useful for 'advanced' configurations such as doing blue/green deployments.RefreshSeconds
configuration parameter has been added (much like in the ECS provider) for configuring how often the Rancher API is queried.HealthState
andState
due to the behavior of the Rancher API.healthy
andupgrading-healthy
are considered to be a healthyHealthState
for containers and services.active
,updating-active
andupgraded
are considered to be a healthyState
for services.running
andupdating-running
are considered to be a healthyState
for containers.If anyone has any feedback or test results please feel free to share them with me here or in #rancher on the Traefik Slack.
Fixes #1253