Traefik is routing traffic to wrong backend. #1174

klausenbusk · 2017-02-20T22:33:11Z

What version of Traefik are you using (`traefik version`)?

~~v1.1.2~~
Edit: also present with v1.2.0-rc1

What is your environment & configuration (arguments, toml...)?

Traefik is running in a Docker container on CoreOS and pulling config from etcd and all traffic is routed through Cloudflare first (ddos protection).

etcd config: (taken from debug log Configuration received from provider etcd:)

{
  "backends": {
    "b1": {
      "servers": {
        "10.133.102.22": {
          "url": "http://10.133.102.22",
          "weight": 0
        },
        "10.133.92.63": {
          "url": "http://10.133.92.63",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b2": {
      "servers": {
        "10.133.13.121": {
          "url": "http://10.133.13.121",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b3": {
      "servers": {
        "grafana": {
          "url": "http://10.133.59.44",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b4": {
      "servers": {
        "10.133.21.211": {
          "url": "http://10.133.21.211",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b5": {
      "servers": {
        "emq": {
          "url": "http://10.133.112.89:8090",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    }
  },
  "frontends": {
    "f1": {
      "entryPoints": [
        "http"
      ],
      "backend": "b1",
      "routes": {
        "r1": {
          "rule": "HostRegexp:admin.foobar.com,secure.foobar.com,{subdomain:(config|queue|status)}.barfoo.eu,barfoo.eu"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f2": {
      "entryPoints": [
        "http"
      ],
      "backend": "b2",
      "routes": {
        "r1": {
          "rule": "HostRegexp:dev.foobar.com,secure.dev.foobar.com,{subdomain:(config|queue|status)}.dev.barfoo.eu,dev.barfoo.eu"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f3": {
      "entryPoints": [
        "http"
      ],
      "backend": "b3",
      "routes": {
        "r1": {
          "rule": "Host:grafana.foobar.com"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f4": {
      "entryPoints": [
        "http"
      ],
      "backend": "b4",
      "routes": {
        "r1": {
          "rule": "Host:api.foobar.com"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f5": {
      "entryPoints": [
        "http"
      ],
      "backend": "b5",
      "routes": {
        "r1": {
          "rule": "Host:ws.foobar.com,ws.barfoo.eu"
        }
      },
      "passHostHeader": true,
      "priority": 0
    }
  }
}

What did you do?

Point backend b5 to another server (the old server did also host the load balancer but on a different port).

What did you expect to see?

That traffic from only frontend f5 get forwarded to b5.

What did you see instead?

That some traffic from f1 (primary traffic to admin.foobar.com) get forwarded to b5

Another thing I observed, none of the request which is forwarded to the wrong server is in the access log, also I haven't be able to reproduce the issue with curl, but I did look at the headers with help from tcpdump and everything looked as it should (I have posted log in the Slack channel).

Edit: Another thing, b5 is used for websocket, if that has anything to say. Maybe that somehow screw something up? Also feel free to ping me on the slack channel.

/cc @containous

The text was updated successfully, but these errors were encountered:

emilevauge · 2017-03-01T10:55:44Z

@klausenbusk I think it's due to the use of websockets. Traefik will not kill the current connections indeed. But this needs some discussions. In your opinion, what would be the perfect behavior?

timoreimann · 2017-03-01T10:59:20Z

@emilevauge shouldn't the graceTimeOut parameter take care of killing those connections on configuration reload?

emilevauge · 2017-03-01T11:06:18Z

@timoreimann indeed, but it seems there may be a regression on this... Even with normal HTTP requests (not websocket). graceTimeOut seems to be used only while shutting down traefik, not during hot-reloads... That's not that bad appart for websockets ;)

klausenbusk · 2017-03-01T11:36:08Z

@klausenbusk I think it's due to the use of websockets. Traefik will not kill the current connections indeed. But this needs some discussions. In your opinion, what would be the perfect behavior?

I have done a little more debugging since I opened the issues. I added a fmt.Println statement to copyRequest before the last if (both httpForwarder and websocketForwarder) in oxy's forward/fwd.go.
What I noticed is that copyRequest isn't called for any of the wrong forwarded request.
So I added a fmt.Println (targetConn.LocalAddr) here: https://github.com/containous/oxy/blob/master/forward/fwd.go#L263 , and the port number match with the webserver logs.

So what I think is going on here, is that some traffic is forwarded over the connection created in func (f *websocketForwarder) serveHTTP. That connection is created when forwarding a websocket upgrade request to a http-only server.

timoreimann · 2017-03-01T13:44:33Z

@emilevauge:

indeed, but it seems there may be a regression on this... Even with normal HTTP requests (not websocket). graceTimeOut seems to be used only while shutting down traefik, not during hot-reloads... That's not that bad appart for websockets ;)

On second thought, you probably don't want to shut down requests after graceTimeOut on hot reloads since those can happen very frequently.

Maybe a second graceful termination parameter might be useful to get rid of (too) long-running websocket connections.

klausenbusk · 2017-03-01T21:18:13Z

Finally after 2 hours debugging I was able to figure out the root cause.

So, the issue is caused by the fact that we use Cloudflare which use keepalive.
I was able to reproduce it locally.
So first let start 2 backend servers:

docker run -p 8081:80 --rm -t -i emilevauge/whoami
docker run -p 8082:80 --rm -t -i emilevauge/whoami

Then start traefik with the following config:

logLevel = "DEBUG"
defaultEntryPoints = ["http"]
[entryPoints]
  [entryPoints.http]
  address = ":8080"


[file]
[backends]
  [backends.backend1]
    [backends.backend1.servers.server1]
    url = "http://127.0.0.1:8081"
    weight = 1
  [backends.backend2]
    [backends.backend2.servers.server1]
    url = "http://127.0.0.1:8082"
    weight = 1



[frontends]
  [frontends.frontend1]
  backend = "backend1"
    [frontends.frontend1.routes.test_1]
    rule = "Host:backend1.com"
  [frontends.frontend2]
  backend = "backend2"
    [frontends.frontend2.routes.test_1]
    rule = "Host:backend2.com"

Then install nginx and use the following config:

worker_processes  1;
events {
    worker_connections  1024;
}

http {
    upstream backend1 {
        server 127.0.0.1:8080;
        keepalive 32;
    }
    map $http_host $backend {
        default backend1;
    }
    map $http_upgrade $connection_upgrade {
        default upgrade;
        ""      "";
    }
    include       mime.types;
    default_type  application/octet-stream;

    server {
        listen       80;
        server_name  localhost;

        location / {
            proxy_pass   http://$backend;
            proxy_set_header Host $host;
            proxy_set_header Upgrade $http_upgrade;
            proxy_http_version 1.1;
            proxy_set_header Connection $connection_upgrade;
        }
    }
}

Nginx is configured with keepalive, so it reuse connections.

Now we should be able to call both backend like:

curl 127.0.0.1 -H "Host: backend1.com"
Hostname: c7994ef9a8db
[...]
curl 127.0.0.1 -H "Host: backend2.com"
Hostname: d4ff25df4e8b

Now lets try requesting a websocket "upgrade"

curl 127.0.0.1 -H "Host: backend1.com" -H "Upgrade: websocket"
Hostname: c7994ef9a8db
[...]

and finally call backend2.com:

curl 127.0.0.1 -H "Host: backend2.com"
Hostname: c7994ef9a8db
[...]

and nginx reuse the hijacked connection which now point to backend1.com.

I'm not sure what the proper way to fix this is, but I can't be the only one using Cloudflare and this can also be abused.

I hope this make sense :)

emilevauge added the investigation-needed label Mar 1, 2017

emilevauge added bug priority/P1 need to be fixed in next release and removed investigation-needed labels Mar 1, 2017

klausenbusk mentioned this issue Mar 5, 2017

Stale backends #1228

Closed

klausenbusk mentioned this issue Apr 24, 2017

[WIP at best] Quick and dirty tcp support #1252

Closed

ldez added kind/bug/confirmed a confirmed bug (reproducible). and removed bug labels Apr 25, 2017

ldez added the area/provider/etcd label Jun 7, 2017

ldez assigned juliens Jun 9, 2017

juliens mentioned this issue Jun 19, 2017

Fix problem with websocket when the connection upgrade failed containous/oxy#17

Merged

ldez closed this as completed in containous/oxy#17 Jun 22, 2017

This was referenced Jun 24, 2017

Change default server weight #1780

Closed

Problem with keepalive when switching protocol failed #1782

Merged

ldez removed the priority/P1 need to be fixed in next release label Jun 26, 2017

ldez added this to the 1.3 milestone Jun 26, 2017

timoreimann mentioned this issue Jul 18, 2017

Update GraceTimeOut documentation #1875

Merged

traefik locked and limited conversation to collaborators Sep 1, 2019

traefiker added the status/5-frozen-due-to-age label Sep 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traefik is routing traffic to wrong backend. #1174

Traefik is routing traffic to wrong backend. #1174

klausenbusk commented Feb 20, 2017 •

edited by ldez

Loading

emilevauge commented Mar 1, 2017

timoreimann commented Mar 1, 2017

emilevauge commented Mar 1, 2017 •

edited

Loading

klausenbusk commented Mar 1, 2017

timoreimann commented Mar 1, 2017

klausenbusk commented Mar 1, 2017 •

edited by ldez

Loading

Traefik is routing traffic to wrong backend. #1174

Traefik is routing traffic to wrong backend. #1174

Comments

klausenbusk commented Feb 20, 2017 • edited by ldez Loading

What version of Traefik are you using (traefik version)?

What is your environment & configuration (arguments, toml...)?

What did you do?

What did you expect to see?

What did you see instead?

emilevauge commented Mar 1, 2017

timoreimann commented Mar 1, 2017

emilevauge commented Mar 1, 2017 • edited Loading

klausenbusk commented Mar 1, 2017

timoreimann commented Mar 1, 2017

klausenbusk commented Mar 1, 2017 • edited by ldez Loading

klausenbusk commented Feb 20, 2017 •

edited by ldez

Loading

What version of Traefik are you using (`traefik version`)?

emilevauge commented Mar 1, 2017 •

edited

Loading

klausenbusk commented Mar 1, 2017 •

edited by ldez

Loading