Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik is routing traffic to wrong backend. #1174

Closed
klausenbusk opened this issue Feb 20, 2017 · 6 comments · Fixed by containous/oxy#17
Closed

Traefik is routing traffic to wrong backend. #1174

klausenbusk opened this issue Feb 20, 2017 · 6 comments · Fixed by containous/oxy#17
Assignees
Milestone

Comments

@klausenbusk
Copy link
Contributor

klausenbusk commented Feb 20, 2017

What version of Traefik are you using (traefik version)?

v1.1.2
Edit: also present with v1.2.0-rc1

What is your environment & configuration (arguments, toml...)?

Traefik is running in a Docker container on CoreOS and pulling config from etcd and all traffic is routed through Cloudflare first (ddos protection).

etcd config: (taken from debug log Configuration received from provider etcd:)

{
  "backends": {
    "b1": {
      "servers": {
        "10.133.102.22": {
          "url": "http://10.133.102.22",
          "weight": 0
        },
        "10.133.92.63": {
          "url": "http://10.133.92.63",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b2": {
      "servers": {
        "10.133.13.121": {
          "url": "http://10.133.13.121",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b3": {
      "servers": {
        "grafana": {
          "url": "http://10.133.59.44",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b4": {
      "servers": {
        "10.133.21.211": {
          "url": "http://10.133.21.211",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    },
    "b5": {
      "servers": {
        "emq": {
          "url": "http://10.133.112.89:8090",
          "weight": 0
        }
      },
      "loadBalancer": {
        "method": "wrr"
      }
    }
  },
  "frontends": {
    "f1": {
      "entryPoints": [
        "http"
      ],
      "backend": "b1",
      "routes": {
        "r1": {
          "rule": "HostRegexp:admin.foobar.com,secure.foobar.com,{subdomain:(config|queue|status)}.barfoo.eu,barfoo.eu"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f2": {
      "entryPoints": [
        "http"
      ],
      "backend": "b2",
      "routes": {
        "r1": {
          "rule": "HostRegexp:dev.foobar.com,secure.dev.foobar.com,{subdomain:(config|queue|status)}.dev.barfoo.eu,dev.barfoo.eu"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f3": {
      "entryPoints": [
        "http"
      ],
      "backend": "b3",
      "routes": {
        "r1": {
          "rule": "Host:grafana.foobar.com"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f4": {
      "entryPoints": [
        "http"
      ],
      "backend": "b4",
      "routes": {
        "r1": {
          "rule": "Host:api.foobar.com"
        }
      },
      "passHostHeader": true,
      "priority": 0
    },
    "f5": {
      "entryPoints": [
        "http"
      ],
      "backend": "b5",
      "routes": {
        "r1": {
          "rule": "Host:ws.foobar.com,ws.barfoo.eu"
        }
      },
      "passHostHeader": true,
      "priority": 0
    }
  }
}

What did you do?

Point backend b5 to another server (the old server did also host the load balancer but on a different port).

What did you expect to see?

That traffic from only frontend f5 get forwarded to b5.

What did you see instead?

That some traffic from f1 (primary traffic to admin.foobar.com) get forwarded to b5

Another thing I observed, none of the request which is forwarded to the wrong server is in the access log, also I haven't be able to reproduce the issue with curl, but I did look at the headers with help from tcpdump and everything looked as it should (I have posted log in the Slack channel).

Edit: Another thing, b5 is used for websocket, if that has anything to say. Maybe that somehow screw something up? Also feel free to ping me on the slack channel.

/cc @containous

@emilevauge
Copy link
Member

@klausenbusk I think it's due to the use of websockets. Traefik will not kill the current connections indeed. But this needs some discussions. In your opinion, what would be the perfect behavior?

@timoreimann
Copy link
Contributor

@emilevauge shouldn't the graceTimeOut parameter take care of killing those connections on configuration reload?

@emilevauge
Copy link
Member

emilevauge commented Mar 1, 2017

@timoreimann indeed, but it seems there may be a regression on this... Even with normal HTTP requests (not websocket). graceTimeOut seems to be used only while shutting down traefik, not during hot-reloads... That's not that bad appart for websockets ;)

@emilevauge emilevauge added bug priority/P1 need to be fixed in next release and removed investigation-needed labels Mar 1, 2017
@klausenbusk
Copy link
Contributor Author

@klausenbusk I think it's due to the use of websockets. Traefik will not kill the current connections indeed. But this needs some discussions. In your opinion, what would be the perfect behavior?

I have done a little more debugging since I opened the issues. I added a fmt.Println statement to copyRequest before the last if (both httpForwarder and websocketForwarder) in oxy's forward/fwd.go.
What I noticed is that copyRequest isn't called for any of the wrong forwarded request.
So I added a fmt.Println (targetConn.LocalAddr) here: https://github.com/containous/oxy/blob/master/forward/fwd.go#L263 , and the port number match with the webserver logs.

So what I think is going on here, is that some traffic is forwarded over the connection created in func (f *websocketForwarder) serveHTTP. That connection is created when forwarding a websocket upgrade request to a http-only server.

@timoreimann
Copy link
Contributor

@emilevauge:

indeed, but it seems there may be a regression on this... Even with normal HTTP requests (not websocket). graceTimeOut seems to be used only while shutting down traefik, not during hot-reloads... That's not that bad appart for websockets ;)

On second thought, you probably don't want to shut down requests after graceTimeOut on hot reloads since those can happen very frequently.

Maybe a second graceful termination parameter might be useful to get rid of (too) long-running websocket connections.

@klausenbusk
Copy link
Contributor Author

klausenbusk commented Mar 1, 2017

Finally after 2 hours debugging I was able to figure out the root cause.

So, the issue is caused by the fact that we use Cloudflare which use keepalive.
I was able to reproduce it locally.
So first let start 2 backend servers:

docker run -p 8081:80 --rm -t -i emilevauge/whoami
docker run -p 8082:80 --rm -t -i emilevauge/whoami

Then start traefik with the following config:

logLevel = "DEBUG"
defaultEntryPoints = ["http"]
[entryPoints]
  [entryPoints.http]
  address = ":8080"


[file]
[backends]
  [backends.backend1]
    [backends.backend1.servers.server1]
    url = "http://127.0.0.1:8081"
    weight = 1
  [backends.backend2]
    [backends.backend2.servers.server1]
    url = "http://127.0.0.1:8082"
    weight = 1



[frontends]
  [frontends.frontend1]
  backend = "backend1"
    [frontends.frontend1.routes.test_1]
    rule = "Host:backend1.com"
  [frontends.frontend2]
  backend = "backend2"
    [frontends.frontend2.routes.test_1]
    rule = "Host:backend2.com"

Then install nginx and use the following config:

worker_processes  1;
events {
    worker_connections  1024;
}

http {
    upstream backend1 {
        server 127.0.0.1:8080;
        keepalive 32;
    }
    map $http_host $backend {
        default backend1;
    }
    map $http_upgrade $connection_upgrade {
        default upgrade;
        ""      "";
    }
    include       mime.types;
    default_type  application/octet-stream;

    server {
        listen       80;
        server_name  localhost;

        location / {
            proxy_pass   http://$backend;
            proxy_set_header Host $host;
            proxy_set_header Upgrade $http_upgrade;
            proxy_http_version 1.1;
            proxy_set_header Connection $connection_upgrade;
        }
    }
}

Nginx is configured with keepalive, so it reuse connections.

Now we should be able to call both backend like:

curl 127.0.0.1 -H "Host: backend1.com"
Hostname: c7994ef9a8db
[...]
curl 127.0.0.1 -H "Host: backend2.com"
Hostname: d4ff25df4e8b

Now lets try requesting a websocket "upgrade"

curl 127.0.0.1 -H "Host: backend1.com" -H "Upgrade: websocket"
Hostname: c7994ef9a8db
[...]

and finally call backend2.com:

curl 127.0.0.1 -H "Host: backend2.com"
Hostname: c7994ef9a8db
[...]

and nginx reuse the hijacked connection which now point to backend1.com.

I'm not sure what the proper way to fix this is, but I can't be the only one using Cloudflare and this can also be abused.

I hope this make sense :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants