-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sticky sessions under SWARM Mode. #1024 #1033
Conversation
ed1eabf
to
1788aef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @foleymic! Thanks a lot for your contribution!
I have few comments:
- I don't really like the option name
traefik.backend.bypass.swarm.loadbalancer
;) It's too long. - I would get this option as default, and rename the option to
traefik.backend.loadbalancer.swarm
instead. - Could you complete documentation in
doc/
?
@vdemeester WDYT?
provider/docker.go
Outdated
|
||
for _, service := range serviceList { | ||
dockerData := parseService(service, networkMap) | ||
|
||
dockerDataList = append(dockerDataList, dockerData) | ||
sticky, _ := strconv.ParseBool(provider.getSticky(dockerData)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this line. We don't need to know the stickiness to bypass or not the swarm LB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because you need to automatically bypass the swarm LB if you want to use sticky sessions. Basically if you explicitly set the flag to bypass or you are using sticky sessions then you should bypass the swarm LB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change. Thanks.
Hi, out of curiosity how does this relate to the swarm dnsrr endpoint mode. From a docker swarm point of view that mode is supposed to be used for loadbalancing. There are is an traefik issues regarding this topic: #833 as well. |
@emilevauge - thanks for the option name suggestion, I agree that it is way to verbose. Just so I'm clear, with the new name, are you suggesting that when |
@joke - The intention of this PR is to bypass the Docker's swarm enpoint dssrr. Without this change, Traefik will only register the swarm service as a single server backend, even if that service has multiple containers behind it. In that case, there is no way for Traefik to honor sticky sessions because he is always directing traffic to a single VIP. From there, Swarm would do the round robin load balancing between the individual containers/tasks. |
Exactly :) |
@emilevauge got it. Thanks. I will make the change today. |
3e5827c
to
59cec1c
Compare
docs/toml.md
Outdated
@@ -795,6 +795,7 @@ Labels can be used on containers to override default behaviour: | |||
- `traefik.backend.maxconn.extractorfunc=client.ip`: set the function to be used against the request to determine what to limit maximum connections to the backend by. Must be used in conjunction with the above label to take effect. | |||
- `traefik.backend.loadbalancer.method=drr`: override the default `wrr` load balancer algorithm | |||
- `traefik.backend.loadbalancer.sticky=true`: enable backend sticky sessions | |||
- `traefik.backend.loadbalancer.swarm=true `: use Swarm's inbuilt load balncer (only relevant under Swarm Mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*balancer ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilevauge - thanks for catching that. Fixing it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment ;)
Other than that LGTM!
Could you also rebase your PR?
/cc @containous/traefik
59cec1c
to
8a911e5
Compare
Thanks @emilevauge. I rebased and corrected toml.md. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments. I need to try this out still 👼
docs/user-guide/swarm-mode.md
Outdated
docker-machine ssh manager "docker service create \ | ||
--name whoami1 \ | ||
--label traefik.port=80 \ | ||
--network traefik-net \ | ||
--label traefik.frontend.rule=Host:whoam1.traefik \ | ||
--label traefik.backend=whoami1 \ | ||
--label traefik.backend.loadbalancer.sticky=true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this up-to-date ? (isn't it supposed to be traefik.backend.loadbalance.swarm
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. Default behavior is traefik.backend.loadbalancer.swarm=true
, so I left it out here but used the existing sticky flag to demonstrate that we can maintain sticky sessions under swarm mode. Let me know if you think I should include traefik.backend.loadbalancer.swarm=true
in the guide as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdemeester - I pushed your other recommended changes.
Thanks,
Mike
provider/docker.go
Outdated
|
||
func listTasks(ctx context.Context, dockerClient client.APIClient, serviceID string, | ||
serviceDockerData dockerData, networkMap map[string]*dockertypes.NetworkResource) ([]dockerData, error) { | ||
taskList, err := dockerClient.TaskList(ctx, dockertypes.TaskListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use filters here (and thus not get all tasks and having to iterate over it).
filters := filters.NewArgs()
filters.Add("service", serviceID)
taskList, err := dockerClient.TaskList(ctx, dockertypes.TaskListOptions{Filters: filters})
// […]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea - I'll make that change.
provider/docker.go
Outdated
if task.NetworksAttachments != nil { | ||
dockerData.NetworkSettings.Networks = make(map[string]*networkData) | ||
for _, virtualIP := range task.NetworksAttachments { | ||
networkService := networkMap[virtualIP.Network.ID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be the following
if networkService, present := networkMap[virtualIP.Network.ID]; present {
// […]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vdemeester. I'm trying this out now.
e5aa77c
to
b2f6657
Compare
b2f6657
to
f966b41
Compare
f966b41
to
5c2d498
Compare
9f4a9dd
to
630012c
Compare
8ba0b66
to
6acd687
Compare
@vdemeester - I just wanted to reach out to see if you have any other concerns or changes you would like on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is currently not working as expected, any value for the
traefik.backend.loadbalancer.swarm
label meanstrue
which is wrong.getIsBackendLBSwarm
should return the value of the label if present. - This changes the default behavior : the swarm load-balancer is disabled by default with this PR. ping @emilevauge @containous/traefik to discuss if that's what we want or not.
When the swarm load balancer is disabled, I'm getting 404 … not sure why…
provider/docker.go
Outdated
@@ -463,6 +463,13 @@ func (provider *Docker) getSticky(container dockerData) string { | |||
return "false" | |||
} | |||
|
|||
func (provider *Docker) getIsBackendLBSwarm(container dockerData) string { | |||
if _, err := getLabel(container, "traefik.backend.loadbalancer.swarm"); err == nil { | |||
return "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's right here. It will always return true, whatever the value of the label is — and false otherwise 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant to question that when I noticed that was how the sticky label was handled (getSticky) - I ended up doing the same thing here to be consistent. Shall I correct them both?
I'll look into the 404 error as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh 😓 yeah, I would think it needs to be fixed (both)
I think this is a good idea to NOT use Swarm LB by default. |
SGTM, was asking just to make sure 👼 |
@foleymic we are going to freeze code for v1.2 tomorrow. If you want your PR to be in this release, please address #1033 (review) :) |
36fe851
to
779a944
Compare
@emilevauge & @vdemeester - I just pushed the requested changes. I was not able to reproduce the 404 errors that Vincent mentioned. Let me know if you're still seeing it. |
@vdemeester - I see the 404 now. If I don't include the front-end rule I get it. I'm looking at it now. Because in non-swarm mode I'm setting the dockerData.name to be the service name + the task slot number (e.g. whoami0.1) so if the front-end rule is not provided traefik will generate one for you off of the this dockerData.name (as container.name). I guess that means it's not matching with the backend somewhere because of this. I fixed this by adding a new string to the dockerData struct called ServiceName. In swarm mode when using traefik to load balance between containers (traefik.backend.loadbalancer.swarm=false) ServiceName will be the name of the Service, while Name will be the Name of the task (e.g. whoami0.1). If traefik.backend.loadbalancer.swarm=true then ServiceName and Name will be the same. |
@foleymic Docker integration tests are failing |
@emilevauge Sorry for not catching the test errors before I pushed last night. The issue is that when testing the front end and backend rules, dockerDate struct is manually created and is still relying on the container name versus the service name. Any concerns with me updating the tests to use ServiceName instead? |
@foleymic Not sure, seems like a breaking change no ? |
@emilevauge Yeah, I was hesitant to suggest it at first. The whole issue I had was due to the way the rules were generated based off the container name versus the service name. Prior to this PR, they were the same in swarm mode because it never went down to the task/container level. Now with this PR, I added ServiceName and left Name to mean be the container name (which is what I thought the original intent was by looking at the code). Update - I found a much simpler solution. Just set dockerData.ServiceName to dockerData.Name in parseContainer. Now that I understand the tests a bit better, I will also try to add some swarm mode tests as a seperate PR. |
57beba0
to
d1deacd
Compare
ping @vdemeester :) |
SWARM Mode has it's own built in Load balancer, so if we want to leverage sticky sessions, or if we would just prefer to bypass it and go directly to the containers (aka tasks), via --label traefik.backend.disable.swarm.loadbalancer=true then we need to let Traefik know about the underlying tasks and register them as services within it's backend.
In Swarm mode with with Docker Swarm’s Load Balancer disabled (traefik.backend.loadbalancer.swarm=false) service name will be the name of the docker service and name will be the container task name (e.g. whoami0.1). When generating backend and fronted rules, we will use service name instead of name if a rule is not provided. Initialize dockerData.ServiceName to dockerData.Name to support non-swarm mode.
d1deacd
to
e0a4c58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🐸 🐳 🐝
I'm fairly confused by this ticket. Are we able to use dns-rr mode with the traefik network with swarm lb disabled? I see a reference to #833 but no clear indication whether or not it is resolved by this change. |
SWARM Mode has it's own built in Load balancer, so if we want to leverage sticky sessions, or if we would just prefer to bypass it and go directly to the containers (aka tasks), via
--label traefik.backend.bypass.swarm.loadbalancer=true
then we need to let Traefik know about the underlying tasks and register them as services within it's backend.
See #1024 for more details