Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Connect: Fails to register routes for Envoy? #4868

Closed
mmisztal1980 opened this issue Oct 29, 2018 · 27 comments
Closed

Consul Connect: Fails to register routes for Envoy? #4868

mmisztal1980 opened this issue Oct 29, 2018 · 27 comments
Labels
needs-investigation The issue described is detailed and complex.

Comments

@mmisztal1980
Copy link

mmisztal1980 commented Oct 29, 2018

Overview of the Issue

I've created a PoC environment for 2x .NET Core 2.1 Services communicating via Consul-Connect.
The entire setup relies on a consul server instance, which uses a services.json file to perform the registrations in a 'static' way. If I understand the process correctly, the sidecar proxies should retrieve their configuration from the consul-server after starting up.

Once the consul server container is healthy, 2x sidecar-proxies start. At this point the entire setup is healthy:
image

image

When attempting to have core-demo-api1 call core-demo-api1 , I'm getting a 404 response.
I've exposed core-demo-api1-sidecar-proxy 's port 19000 and obtained a config dump in which I do not see any routes defined to core-demo-api2, which I believe is the root cause for the communication issue between the 2 services. I believe I've followed the available documentation to the letter, so my situation can be a potential bug.

{
    "configs": [
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.BootstrapConfigDump",
            "bootstrap": {
                "node": {
                    "id": "core-demo-api1-sidecar-proxy",
                    "cluster": "core-demo-api1-sidecar-proxy",
                    "build_version": "5d25f466c3410c0dfa735d7d4358beb76b2da507/1.8.0/Clean/RELEASE"
                },
                "static_resources": {
                    "clusters": [
                        {
                            "name": "local_agent",
                            "connect_timeout": "1s",
                            "hosts": [
                                {
                                    "socket_address": {
                                        "address": "127.0.0.1",
                                        "port_value": 8502
                                    }
                                }
                            ],
                            "http2_protocol_options": {}
                        }
                    ]
                },
                "dynamic_resources": {
                    "lds_config": {
                        "ads": {}
                    },
                    "cds_config": {
                        "ads": {}
                    },
                    "ads_config": {
                        "api_type": "GRPC",
                        "grpc_services": [
                            {
                                "envoy_grpc": {
                                    "cluster_name": "local_agent"
                                },
                                "initial_metadata": [
                                    {
                                        "key": "x-consul-token"
                                    }
                                ]
                            }
                        ]
                    }
                },
                "admin": {
                    "access_log_path": "/dev/null",
                    "address": {
                        "socket_address": {
                            "address": "0.0.0.0",
                            "port_value": 19000
                        }
                    }
                }
            },
            "last_updated": "2018-10-29T09:22:19.322Z"
        },
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.ClustersConfigDump",
            "static_clusters": [
                {
                    "cluster": {
                        "name": "local_agent",
                        "connect_timeout": "1s",
                        "hosts": [
                            {
                                "socket_address": {
                                    "address": "127.0.0.1",
                                    "port_value": 8502
                                }
                            }
                        ],
                        "http2_protocol_options": {}
                    },
                    "last_updated": "2018-10-29T09:22:19.329Z"
                }
            ]
        },
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.ListenersConfigDump"
        }
    ]
}

Reproduction Steps

consul-envoy/Dockerfile

FROM consul:1.3.0
FROM envoyproxy/envoy:v1.8.0
COPY --from=0 /bin/consul /bin/consul
ENTRYPOINT ["dumb-init", "consul", "connect", "envoy"]

services.json

{
    "connect": {
        "enabled": true
    },
    "services": [
        {
            "name": "core-demo-api1",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "port": 19000,
                    "proxy": {                        
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 80
                            }
                        ]
                    },
                    "checks": [
                        {
                            "name": "Connect Sidecar Listening",
                            "tcp": "core-demo-api1-sidecar-proxy:19000",
                            "interval": "10s"
                        },
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api1",
                            "alias_service": "core-demo-api1"
                        }
                    ]
                }
            }
        },
        {
            "name": "core-demo-api2",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "port": 19000,
                    "proxy": {                        
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api1",
                                "local_bind_port": 80
                            }
                        ]
                    },
                    "checks": [
                        {
                            "name": "Connect Sidecar Listening",
                            "tcp": "core-demo-api2-sidecar-proxy:19000",
                            "interval": "10s"
                        },
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api2",
                            "alias_service": "core-demo-api2"
                        }
                    ]
                }
            }
        }
    ]
}

docker-compose.yml

version: '2.4'

services:
  consul:
    image: consul:1.3.0
    command: "agent -dev -server -bootstrap-expect 1 -ui -client 0.0.0.0 -datacenter local -node docker -config-file /etc/consul/services.json"
    volumes:
      - ./demo/consul/services.json:/etc/consul/services.json
  core-demo-api1:
    image: core-demo-api1
    build:
      context: .
      dockerfile: demo/Core.Demo.Api1/Dockerfile
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    build:
      context: ./docker/consul-envoy
      dockerfile: ./Dockerfile
    depends_on: 
      consul:
        condition: service_healthy

  core-demo-api2:
    image: core-demo-api2
    build:
      context: .
      dockerfile: demo/Core.Demo.Api2/Dockerfile
  core-demo-api2-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    build:
      context: ./docker/consul-envoy
      dockerfile: ./Dockerfile
    depends_on: 
      consul:
        condition: service_healthy

docker-compose.override.yml

version: '2.4'

services:
  consul:
    ports:
      - "8400:8400"
      - "8500:8500"
      - "8600:8600"
      - "8600:8600/udp"
    healthcheck:
      test: ["CMD-SHELL", "curl --silent --fail localhost:8500/v1/agent/services || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  core-demo-api1:
    environment:
      - ASPNETCORE_ENVIRONMENT=Compose
      - ASPNETCORE_URLS=http://+:80
      - HOSTNAME=core-demo-api1
    ports:
      - "55127:80"
  core-demo-api1-sidecar-proxy:
    ports:
      - "19000:19000"
  core-demo-api2:
    environment:
      - ASPNETCORE_ENVIRONMENT=Compose
      - ASPNETCORE_URLS=http://+:80
      - HOSTNAME=core-demo-api2
    ports:
      - "55185:80"

Consul info for both Client and Server

Client info

(!) Executed inside one of the sidecar-proxy containers (!)

# consul info
Error querying agent: Get http://127.0.0.1:8500/v1/agent/self: dial tcp 127.0.0.1:8500: connect: connection refused
envoy.yaml
admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  host_rewrite: www.google.com
                  cluster: service_google
          http_filters:
          - name: envoy.router
  clusters:
  - name: service_google
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    # Comment out the following line to test on v6 networks
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    hosts:
      - socket_address:
          address: google.com
          port_value: 443
    tls_context: { sni: www.google.com }
Server info
# consul info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 4
        services = 4
build:
        prerelease =
        revision = e8757838
        version = 1.3.0
consul:
        bootstrap = true
        known_datacenters = 1
        leader = true
        leader_addr = 127.0.0.1:8300
        server = true
raft:
        applied_index = 1439
        commit_index = 1439
        fsm_pending = 0
        last_contact = 0
        last_log_index = 1439
        last_log_term = 2
        last_snapshot_index = 0
        last_snapshot_term = 0
        latest_configuration = [{Suffrage:Voter ID:7df0a6a8-3f7a-10f6-d759-e46d1b024aa2 Address:127.0.0.1:8300}]
        latest_configuration_index = 1
        num_peers = 0
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 2
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 103
        max_procs = 2
        os = linux
        version = go1.11.1
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 1
        event_time = 2
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Operating system and Environment details

0
Windows 10 Enterprise x64 running Docker for Windows (MobyLinuxVM)

Log Fragments

TBD
Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

@rboyer
Copy link
Member

rboyer commented Oct 29, 2018

The getting started guide has the proxies and the consul agent run within the same network namespace with the docker flag --network host.

In your example your sidecar proxies and consul agent do not share a network namespace, so the default gRPC server address will need to be explicitly configured like you did with the HTTP api address. The gRPC server serves the xDS protocol for envoy.

In your example you would need to specify -grpc-addr consul:8502 in addition to -http-addr http://consul:8500 like:

-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502

I picked up on the misconfig because I saw that the .configs[0].bootstrap.static_resources.clusters[0].hosts[0].socket_address value in your envoy admin config dump was set to { "address": "127.0.0.1", "port_value": 8502 } instead of the ip address of the consul agent.

@mmisztal1980
Copy link
Author

mmisztal1980 commented Oct 29, 2018

Hi @rboyer , thanks for the hint, that seems to have helped.

stdout
 docker-compose up
Starting core-pkg_core-demo-api1_1 ... done
Starting core-pkg_core-demo-api2_1 ... done
Starting core-pkg_consul_1         ... done
Creating core-pkg_core-demo-api2-sidecar-proxy_1 ... done
Creating core-pkg_core-demo-api1-sidecar-proxy_1 ... done
Attaching to core-pkg_core-demo-api2_1, core-pkg_core-demo-api1_1, core-pkg_consul_1, core-pkg_core-demo-api2-sidecar-proxy_1, core-pkg_core-demo-api1-sidecar-proxy_1
core-demo-api2_1                | Hosting environment: Compose
core-demo-api2_1                | Content root path: /app
core-demo-api2_1                | Now listening on: http://[::]:80
core-demo-api2_1                | Application started. Press Ctrl+C to shut down.
core-demo-api1_1                | info: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[0]
core-demo-api1_1                |       User profile is available. Using '/root/.aspnet/DataProtection-Keys' as key repository; keys will not be encrypted at rest.
core-demo-api1_1                | Hosting environment: Compose
core-demo-api1_1                | Content root path: /app
core-demo-api1_1                | Now listening on: http://[::]:80
core-demo-api1_1                | Application started. Press Ctrl+C to shut down.
consul_1                        | BootstrapExpect is set to 1; this is the same as Bootstrap mode.
consul_1                        | bootstrap = true: do not enable unless necessary
consul_1                        | ==> Starting Consul agent...
consul_1                        | ==> Consul agent running!
consul_1                        |            Version: 'v1.3.0'
consul_1                        |            Node ID: 'f06b40fe-308f-6b01-9f1a-82c46089417a'
consul_1                        |          Node name: 'docker'
consul_1                        |         Datacenter: 'local' (Segment: '<all>')
consul_1                        |             Server: true (Bootstrap: true)
consul_1                        |        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
consul_1                        |       Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul_1                        |            Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul_1                        |
consul_1                        | ==> Log data will now stream in as it occurs:
consul_1                        |
consul_1                        |     2018/10/29 20:34:09 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:f06b40fe-308f-6b01-9f1a-82c46089417a Address:127.0.0.1:8300}]
consul_1                        |     2018/10/29 20:34:09 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "")
consul_1                        |     2018/10/29 20:34:09 [INFO] serf: EventMemberJoin: docker.local 127.0.0.1
consul_1                        |     2018/10/29 20:34:09 [INFO] serf: EventMemberJoin: docker 127.0.0.1
consul_1                        |     2018/10/29 20:34:09 [INFO] consul: Handled member-join event for server "docker.local" in area "wan"
consul_1                        |     2018/10/29 20:34:09 [INFO] consul: Adding LAN server docker (Addr: tcp/127.0.0.1:8300) (DC: local)
consul_1                        |     2018/10/29 20:34:09 [DEBUG] agent/proxy: managed Connect proxy manager started
consul_1                        |     2018/10/29 20:34:09 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
consul_1                        |     2018/10/29 20:34:09 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
consul_1                        |     2018/10/29 20:34:09 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
consul_1                        |     2018/10/29 20:34:09 [INFO] agent: started state syncer
consul_1                        |     2018/10/29 20:34:09 [INFO] agent: Started gRPC server on [::]:8502 (tcp)
consul_1                        |     2018/10/29 20:34:09 [WARN] raft: Heartbeat timeout from "" reached, starting election
consul_1                        |     2018/10/29 20:34:09 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2
consul_1                        |     2018/10/29 20:34:09 [DEBUG] raft: Votes needed: 1
consul_1                        |     2018/10/29 20:34:09 [DEBUG] raft: Vote granted from f06b40fe-308f-6b01-9f1a-82c46089417a in term 2. Tally: 1
consul_1                        |     2018/10/29 20:34:09 [INFO] raft: Election won. Tally: 1
consul_1                        |     2018/10/29 20:34:09 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state
consul_1                        |     2018/10/29 20:34:09 [INFO] consul: cluster leadership acquired
consul_1                        |     2018/10/29 20:34:09 [INFO] consul: New leader elected: docker
consul_1                        |     2018/10/29 20:34:10 [INFO] connect: initialized CA with provider "consul"
consul_1                        |     2018/10/29 20:34:10 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/10/29 20:34:10 [INFO] consul: member 'docker' joined, marking health alive
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api1-sidecar-proxy"
consul_1                        |     2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api2"
consul_1                        |     2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api2-sidecar-proxy"
consul_1                        |     2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api1"
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Node info in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Node info in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:10 [DEBUG] agent: Node info in sync
consul_1                        |     2018/10/29 20:34:11 [WARN] agent: Check "service:core-demo-api2-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api2-sidecar-proxy on 127.0.0.11:53: no such host
consul_1                        |     2018/10/29 20:34:14 [WARN] agent: Check "service:core-demo-api1-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api1-sidecar-proxy on 127.0.0.11:53: no such host
consul_1                        |     2018/10/29 20:34:21 [WARN] agent: Check "service:core-demo-api2-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api2-sidecar-proxy on 127.0.0.11:53: no such host
consul_1                        |     2018/10/29 20:34:24 [WARN] agent: Check "service:core-demo-api1-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api1-sidecar-proxy on 127.0.0.11:53: no such host
consul_1                        |     2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/catalog/datacenters (337.4µs) from=172.21.0.1:46480
consul_1                        |     2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/catalog/datacenters (372.7µs) from=172.21.0.1:46480
consul_1                        |     2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/catalog/datacenters (184.2µs) from=172.21.0.1:46480
consul_1                        |     2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/internal/ui/services?dc=local (387.5µs) from=172.21.0.1:46480
consul_1                        |     2018/10/29 20:34:31 [WARN] agent: Check "service:core-demo-api2-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api2-sidecar-proxy on 127.0.0.11:53: no such host
consul_1                        |     2018/10/29 20:34:34 [WARN] agent: Check "service:core-demo-api1-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api1-sidecar-proxy on 127.0.0.11:53: no such host
consul_1                        |     2018/10/29 20:34:39 [DEBUG] http: Request GET /v1/agent/services (273.6µs) from=127.0.0.1:34188
consul_1                        |     2018/10/29 20:34:41 [DEBUG] http: Request GET /v1/agent/services (169.2µs) from=172.21.0.5:50354
consul_1                        |     2018/10/29 20:34:41 [DEBUG] http: Request GET /v1/agent/services (203.6µs) from=172.21.0.6:39096
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.401][6][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled)
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:204] statically linked extensions:
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:206]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:209]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:212]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:215]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:217]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:219]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:222]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:225]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.406][6][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.406][6][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s)
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.406][6][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s)
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled)
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:204] statically linked extensions:
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:206]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:209]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:212]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:215]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:217]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:219]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:222]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:225]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.419][6][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.420][6][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s)
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.420][6][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s)
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.412][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.413][6][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s)
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.413][6][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.413][6][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.414][6][info][main] source/server/server.cc:452] starting main dispatch loop
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.423][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.426][6][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s)
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.426][6][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.426][6][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.426][6][info][main] source/server/server.cc:452] starting main dispatch loop
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.430][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.435][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.436][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api1 during init
core-demo-api2-sidecar-proxy_1  | [2018-10-29 20:34:41.436][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.437][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api2 during init
core-demo-api1-sidecar-proxy_1  | [2018-10-29 20:34:41.437][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:41 [INFO] agent: Synced check "service:core-demo-api2-sidecar-proxy:1"
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:41 [DEBUG] agent: Node info in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:34:44 [INFO] agent: Synced check "service:core-demo-api1-sidecar-proxy:1"
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:34:44 [DEBUG] agent: Node info in sync
consul_1                        |     2018/10/29 20:34:45 [DEBUG] http: Request GET /v1/internal/ui/nodes?dc=local (465.7µs) from=172.21.0.1:46480
consul_1                        |     2018/10/29 20:34:46 [DEBUG] http: Request GET /v1/internal/ui/services?dc=local (152.8µs) from=172.21.0.1:46480
consul_1                        |     2018/10/29 20:34:51 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:34:54 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:01 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:04 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:09 [DEBUG] http: Request GET /v1/agent/services (95.5µs) from=127.0.0.1:34296
consul_1                        |     2018/10/29 20:35:09 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/10/29 20:35:11 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:14 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:21 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:24 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:31 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:34 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:39 [DEBUG] http: Request GET /v1/agent/services (147.6µs) from=127.0.0.1:34392
consul_1                        |     2018/10/29 20:35:41 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:44 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:51 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:35:54 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:36:01 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:36:04 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing
consul_1                        |     2018/10/29 20:36:09 [DEBUG] http: Request GET /v1/agent/services (128.1µs) from=127.0.0.1:34486
consul_1                        |     2018/10/29 20:36:09 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local)
consul_1                        |     2018/10/29 20:36:09 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync
consul_1                        |     2018/10/29 20:36:10 [DEBUG] agent: Node info in sync

Can you elaborate on that behavior? Why is the gRPC setting required when not specifying a docker network? I seem to have missed that bit in the docs or it's not well described

@rboyer
Copy link
Member

rboyer commented Oct 29, 2018

In a non-containerized setup a consul agent (running in lightweight client mode) and applications using that consul agent communicate over a localhost connection using the HTTP api. This avoids the problem of discovering your service discovery system by assuming you can simply communicate over a prearranged port (:8500).

As of v1.3, consul also listens on a gRPC port (defaults to :8502) when Connect is enabled. This speaks the xDS protocol specifically for use by the envoy instances. Envoy itself knows nothing of consul, so it cannot use the consul HTTP api.

The consul connect envoy subcommand briefly uses the consul HTTP api on startup before exec-ing the envoy binary which then strictly communicates with the running consul agent using gRPC from there on out speaking only xDS. So direct access to both ports is necessary for the sidecar to be setup and run.

The getting started guide uses host networking for simplicity so that the consul agent and all of the envoy instances are free to communicate with each other directly over localhost.

If these are not co-located then the defaults cannot be used and will have to be explicitly configured.

@mmisztal1980
Copy link
Author

Ok, this makes a ton of sense now.

I've obtained another config dump, the proxy for core-demo-api1 seems to be aware of core-demo-api2 now:

config_dump
{
    "configs": [
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.BootstrapConfigDump",
            "bootstrap": {
                "node": {
                    "id": "core-demo-api1-sidecar-proxy",
                    "cluster": "core-demo-api1-sidecar-proxy",
                    "build_version": "5d25f466c3410c0dfa735d7d4358beb76b2da507/1.8.0/Clean/RELEASE"
                },
                "static_resources": {
                    "clusters": [
                        {
                            "name": "local_agent",
                            "connect_timeout": "1s",
                            "hosts": [
                                {
                                    "socket_address": {
                                        "address": "172.18.0.2",
                                        "port_value": 8502
                                    }
                                }
                            ],
                            "http2_protocol_options": {}
                        }
                    ]
                },
                "dynamic_resources": {
                    "lds_config": {
                        "ads": {}
                    },
                    "cds_config": {
                        "ads": {}
                    },
                    "ads_config": {
                        "api_type": "GRPC",
                        "grpc_services": [
                            {
                                "envoy_grpc": {
                                    "cluster_name": "local_agent"
                                },
                                "initial_metadata": [
                                    {
                                        "key": "x-consul-token"
                                    }
                                ]
                            }
                        ]
                    }
                },
                "admin": {
                    "access_log_path": "/dev/null",
                    "address": {
                        "socket_address": {
                            "address": "0.0.0.0",
                            "port_value": 19000
                        }
                    }
                }
            },
            "last_updated": "2018-10-29T20:43:11.078Z"
        },
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.ClustersConfigDump",
            "version_info": "00000001",
            "static_clusters": [
                {
                    "cluster": {
                        "name": "local_agent",
                        "connect_timeout": "1s",
                        "hosts": [
                            {
                                "socket_address": {
                                    "address": "172.18.0.2",
                                    "port_value": 8502
                                }
                            }
                        ],
                        "http2_protocol_options": {}
                    },
                    "last_updated": "2018-10-29T20:43:11.080Z"
                }
            ],
            "dynamic_active_clusters": [
                {
                    "version_info": "00000001",
                    "cluster": {
                        "name": "local_app",
                        "connect_timeout": "5s",
                        "hosts": [
                            {
                                "socket_address": {
                                    "address": "127.0.0.1",
                                    "port_value": 80
                                }
                            }
                        ]
                    },
                    "last_updated": "2018-10-29T20:43:11.083Z"
                },
                {
                    "version_info": "00000001",
                    "cluster": {
                        "name": "service:core-demo-api2",
                        "type": "EDS",
                        "eds_cluster_config": {
                            "eds_config": {
                                "ads": {}
                            }
                        },
                        "connect_timeout": "5s",
                        "tls_context": {
                            "common_tls_context": {
                                "tls_params": {},
                                "tls_certificates": [
                                    {
                                        "certificate_chain": {
                                            "inline_string": "-----BEGIN CERTIFICATE-----\nMIICoDCCAkWgAwIBAgIBDzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\nQ0EgNzAeFw0xODEwMjkyMDQxMzlaFw0xODExMDEyMDQxMzlaMBkxFzAVBgNVBAMT\nDmNvcmUtZGVtby1hcGkxMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEddEmcU3J\n93aVNkjVBacdp4YsfWVQQbDBAV2Vp6+x91L9ZenMfNYlka0PjMUnPlTLsXA+RygP\njU0KEezoMRzK96OCAX8wggF7MA4GA1UdDwEB/wQEAwIDuDAdBgNVHSUEFjAUBggr\nBgEFBQcDAgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADBoBgNVHQ4EYQRfYWY6NDE6\nNWU6Yzc6NDA6Yzc6OGU6MGE6MTk6MTk6MDI6ZDI6NDM6NDk6YjE6MDA6ZTc6YjA6\nN2E6MGM6MTQ6ZmQ6Mzg6MDU6ZjA6ZjQ6MTI6ODI6OTg6OTA6Y2Q6MTQwagYDVR0j\nBGMwYYBfYWY6NDE6NWU6Yzc6NDA6Yzc6OGU6MGE6MTk6MTk6MDI6ZDI6NDM6NDk6\nYjE6MDA6ZTc6YjA6N2E6MGM6MTQ6ZmQ6Mzg6MDU6ZjA6ZjQ6MTI6ODI6OTg6OTA6\nY2Q6MTQwZgYDVR0RBF8wXYZbc3BpZmZlOi8vOTg2YTI2YTAtMmY1ZS1jNTc2LWJh\nNTMtNmJhYWUyOTM1OGQ1LmNvbnN1bC9ucy9kZWZhdWx0L2RjL2xvY2FsL3N2Yy9j\nb3JlLWRlbW8tYXBpMTAKBggqhkjOPQQDAgNJADBGAiEA9ogH7GSXunQknJsqPeW3\n8yAVgSeifpcgj1x4CefQ9b4CIQDFUOZNrknZHdP5XtpnUI12mojZFOfOKZZTLU03\n/+7yEg==\n-----END CERTIFICATE-----\n"
                                        },
                                        "private_key": {
                                            "inline_string": "-----BEGIN EC PRIVATE KEY-----\nMHcCAQEEICLItQyoIMOvBuXlFnPy7NHdOyCZvKyvs9XwoMLyVJ+GoAoGCCqGSM49\nAwEHoUQDQgAEddEmcU3J93aVNkjVBacdp4YsfWVQQbDBAV2Vp6+x91L9ZenMfNYl\nka0PjMUnPlTLsXA+RygPjU0KEezoMRzK9w==\n-----END EC PRIVATE KEY-----\n"
                                        }
                                    }
                                ],
                                "validation_context": {
                                    "trusted_ca": {
                                        "inline_string": "-----BEGIN CERTIFICATE-----\nMIICWTCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\nQ0EgNzAeFw0xODEwMjkyMDQyMzlaFw0yODEwMjYyMDQyMzlaMBYxFDASBgNVBAMT\nC0NvbnN1bCBDQSA3MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAER2QBoP+HINzV\nJQn/rgRFVBGTgSrX9+qd9lK9U6y+dDvpT5PbPyO7kZSU0GCmAT/NjGoqIGtgW9pp\nHuOF8CxKgKOCATwwggE4MA4GA1UdDwEB/wQEAwIBhjAPBgNVHRMBAf8EBTADAQH/\nMGgGA1UdDgRhBF9hZjo0MTo1ZTpjNzo0MDpjNzo4ZTowYToxOToxOTowMjpkMjo0\nMzo0OTpiMTowMDplNzpiMDo3YTowYzoxNDpmZDozODowNTpmMDpmNDoxMjo4Mjo5\nODo5MDpjZDoxNDBqBgNVHSMEYzBhgF9hZjo0MTo1ZTpjNzo0MDpjNzo4ZTowYTox\nOToxOTowMjpkMjo0Mzo0OTpiMTowMDplNzpiMDo3YTowYzoxNDpmZDozODowNTpm\nMDpmNDoxMjo4Mjo5ODo5MDpjZDoxNDA/BgNVHREEODA2hjRzcGlmZmU6Ly85ODZh\nMjZhMC0yZjVlLWM1NzYtYmE1My02YmFhZTI5MzU4ZDUuY29uc3VsMAoGCCqGSM49\nBAMCA0gAMEUCIAE01fG9L9G90KtEfZIzVFrCrSEYnysQp5lZiyugWfBHAiEAh4CS\njy2F8MAE0gHy1WT4tLWV5PzxK7Wx4uboMbsST5Y=\n-----END CERTIFICATE-----\n"
                                    }
                                }
                            }
                        }
                    },
                    "last_updated": "2018-10-29T20:43:11.084Z"
                }
            ]
        },
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.ListenersConfigDump"
        }
    ]
}

I'm still getting 404s when calling core-demo-api2 from core-demo-api1 through the core-demo-api1-sidecar-proxy though.

Should I use:

  • http://core-demo-api1-sidecar-proxy:19000/core-demo-api2/api/values
    or
  • http://core-demo-api1-sidecar-proxy:19000/api/values while setting the host header to core-demo-api2 ?
    or something completely different?

@banks
Copy link
Member

banks commented Oct 30, 2018

@mmisztal1980 your client apps should talk to whichever local port the upstream is listening on - unless you setup something I missed hostnames like core-demo-api1-sidecar-proxy aren't available.

For example your service definition for core-demo-api1 has the following upstream definition:

                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 80
                            }

This is saying "Please configure the proxy to listen on localhost:80 and proxy those connections to the code-demo-api2 service".

So the proxy will try to do just that but will (probably) collide since your actual service is already listening on port 80.

The idea here is that you pick some port for each upstream that you want to expose the service on over loopback. The port number is arbitrary and the only thing that cares about it is your application.

For example if you changed that definition to:

                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 8080
                            }

Then the proxy would listen on localhost:8080 (technically 127.0.0.1:8080) and you app would be able to connect just using http://127.0.0.1:8080/api/values.

Note that this is only layer 4 (TCP/TLS) proxying so there are no HTTP paths or routes in the mix. L7 support will come later.

Hope you get that working!

@mmisztal1980
Copy link
Author

mmisztal1980 commented Oct 30, 2018

@banks Thanks for clearing that up, now I understand the concept.

I've reconfigured the core-demo-api1 registration:

core-demo-api1 registration (proxy port for core-demo-api2 changed to 19100)
 {
            "name": "core-demo-api1",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "port": 19000,
                    "proxy": {                        
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 19100
                            }
                        ]
                    },
                    "checks": [
                        {
                            "name": "Connect Sidecar Listening",
                            "tcp": "core-demo-api1-sidecar-proxy:19000",
                            "interval": "10s"
                        },
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api1",
                            "alias_service": "core-demo-api1"
                        }
                    ]
                }
            }
        }

I've tried to execute a call vs my service's proxy, my client keeps throwing exceptions:

image

I've signed onto the proxy container to examine whether or not, the port is being listened on:

 netstat -ano
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       Timer
tcp        0      0 127.0.0.11:32977        0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 0.0.0.0:19000           0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 172.19.0.5:49484        91.189.88.161:80        TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 172.19.0.5:44220        172.19.0.3:8502         ESTABLISHED off (0.00/0/0)
udp        0      0 127.0.0.11:55487        0.0.0.0:*                           off (0.00/0/0)
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   Path

I do not see port 19100 being listed :/

BTW, you've mentioned that local_bind_port will listen on the localhost interface. I'd prefer 0.0.0.0 in my scenario (multiple docker containers not running in a k8s pod (yet)) - how can I achieve that?

@mmisztal1980
Copy link
Author

I've just re-read the paragraph on Sidecar Service Registration.

The upstreams use local_bind_port, while in the Sidecar Service Defaults there's no mention of that parameter, but there is something else listed: local_service_port under proxy.

Is one transformed into the other? I think the docs are somewhat inconsistent here and don't really offer guidance on how to solve my above question :/

@banks
Copy link
Member

banks commented Oct 30, 2018

@mmisztal1980 can you clarify how you are running this again?

Port 19000 happens to be the port we chose for the Envoy admin API (which you can't just disable) so iuf you are using Envoy here then it's not a great choice for your own stuff and might explain why you are seeing things listening on 19000 but not getting the response you expect.

So I suspect you are not seeing any proxy running at all based on your netstat output. How are you starting the proxy? Is this in Kube?

Can you include the output of the consul connect envoy commands?

BTW, you've mentioned that local_bind_port will listen on the localhost interface. I'd prefer 0.0.0.0 in my scenario (multiple docker containers not running in a k8s pod (yet)) - how can I achieve that?

There is a local_bind_address too but note that anything non-local is insecure in prod (fine for just testing etc.) Anything that can talk to the "private" listeners of the proxy can assume the identity and all access granted to that service which is why it's typical to expose this only over loopback.

If you are trying to get this to work with docker compose I recommend using shared network namespaces at least between the app and sidecar containers rather than just exposing them over the docker bridge. This is easy in docker, start up the app first then start the proxy container using `--network "container:core-demo-api2-container-name" or equivalent in Compose. then the app and the proxy can talk over localhost just like in a kube pod (that is all Kube is doing under the hood).

@rboyer
Copy link
Member

rboyer commented Oct 30, 2018

If the docker-compose file above is still roughly what you're using, I noticed another small plumbing issue.

Because envoy will be listening on 127.0.0.1 (loopback) for exclusive outbound traffic access, the sidecars need to share the same network namespace with your app so it can connect. The way you have them configured above each of your 5 containers (consul, app1, sidecar1, app2, sidecar2) gets its networking stack complete with local ip address and personal isolated 127.0.0.1 address.

For example, instead of:

services:
  core-demo-api1:
    image: core-demo-api1
    ...
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    ...

You should have something like:

services:
  core-demo-api1:
    image: core-demo-api1
    ...
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    network_mode: "service:core-demo-api1"
    ...

And similar plumbing for core-demo-api2-sidecar-proxy.

@rboyer
Copy link
Member

rboyer commented Oct 30, 2018

I'm also interested in knowing more about the lifespan of this POC. Is it mostly for kicking the tires of Connect in conjunction with your development stack or are you also planning on using something like docker compose to deploy a variation of the POC directly to a docker swarm cluster?

@mmisztal1980
Copy link
Author

mmisztal1980 commented Oct 30, 2018

My intention is to do a poc on k8s, however, .NET Solutions developed under Visual Studio 2017 have a docker-compose integration, so I wanted to get that up and running 1st. That way I get do debug/test the solution locally before I deploy it to an actual k8s cluster.

In the long run, at my current company, we are working on decomposing a monolith and migrating it to a new hosting platform while retaining 100% uptime. Consul connect it v. interesting because it can provide a service mesh, and a consistent experience when services communicate with each other, regardless of the hosting platform. That's pretty much the genesis for the PoC.

@rboyer I hope this satisfies your question?

On a side note, this is also a brilliant opportunity to learn.

@mmisztal1980
Copy link
Author

adding network_mode has caused the healthCheck between consul and the proxies to fail :/

@mmisztal1980
Copy link
Author

mmisztal1980 commented Nov 3, 2018

@rboyer @banks I've added the network_mode: service:{service-name} to both proxies. As far as I understand that ought to make them reuse their counterparts' network stacks - so if I understand correctly, both a service container and it's counterpart proxy, should be able to communicate both ways via localhost.

I've modified my REST call inside core-demo-api1 to call : http://127.0.0.1:19100 - the port I've declared above for core-demo-api1-sidecar-proxy to route to core-demo-api2. I'm getting connection refused, even though, when attaching to the proxy container I can see that 127.0.0.1:19100 is being listened on

HTTP Call from core-demo-api1 to core-demo-api2 via core-demo-api1-sidecar-proxy
var response = await httpClient.GetAsync("http://127.0.0.1:19100/api/values");
Exception message
core-demo-api1_1                | fail: Microsoft.AspNetCore.Server.Kestrel[13]
core-demo-api1_1                |       Connection id "0HLI1KK6RIP1V", Request id "0HLI1KK6RIP1V:00000001": An unhandled exception was thrown by the application.
core-demo-api1_1                | System.Net.Http.HttpRequestException: Connection refused ---> System.Net.Sockets.SocketException: Connection refused
core-demo-api1_1                |    at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
core-demo-api1_1                |    --- End of inner exception stack trace ---
core-demo-api1_1                |    at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
core-demo-api1_1                |    at System.Threading.Tasks.ValueTask`1.get_Result()
core-demo-api1_1                |    at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
core-demo-api1_1                |    at System.Threading.Tasks.ValueTask`1.get_Result()
core-demo-api1_1                |    at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
core-demo-api1_1                |    at System.Threading.Tasks.ValueTask`1.get_Result()
core-demo-api1_1                |    at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
core-demo-api1_1                |    at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
core-demo-api1_1                |    at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
core-demo-api1_1                |    at Core.Demo.Api1.Controllers.ValuesController.Get() in /src/demo/Core.Demo.Api1/Controllers/ValuesController.cs:line 31
core-demo-api1_1                |    at lambda_method(Closure , Object )
core-demo-api1_1                |    at Microsoft.Extensions.Internal.ObjectMethodExecutorAwaitable.Awaiter.GetResult()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
core-demo-api1_1                |    at System.Threading.Tasks.ValueTask`1.get_Result()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context)
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
core-demo-api1_1                |    at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
core-demo-api1_1                |    at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
core-demo-api1_1                |    at Core.Api.Middleware.LivenessProbeMiddleware.InvokeAsync(HttpContext context) in /src/src/Core.Api/Middleware/LivenessProbeMiddleware.cs:line 23
core-demo-api1_1                |    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
docker-compose-yml
version: '2.4'

services:
  consul:
    image: consul:1.3.0
    command: "agent -dev -server -bootstrap-expect 1 -ui -client 0.0.0.0 -datacenter local -node docker -config-file /etc/consul/services.json"
    volumes:
      - ./demo/consul/services.json:/etc/consul/services.json
  core-demo-api1:
    image: core-demo-api1
    build:
      context: .
      dockerfile: demo/Core.Demo.Api1/Dockerfile
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502"
    network_mode: "service:core-demo-api1"
    build:
      context: ./docker/consul-envoy
      dockerfile: ./Dockerfile
    depends_on: 
      consul:
        condition: service_healthy

  core-demo-api2:
    image: core-demo-api2
    build:
      context: .
      dockerfile: demo/Core.Demo.Api2/Dockerfile
  core-demo-api2-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502"
    network_mode: "service:core-demo-api2"
    build:
      context: ./docker/consul-envoy
      dockerfile: ./Dockerfile
    depends_on: 
      consul:
        condition: service_healthy

docker-compose.override.yml

version: '2.4'

services:
  consul:
    ports:
      - "8400:8400"
      - "8500:8500"
      - "8600:8600"
      - "8600:8600/udp"
    healthcheck:
      test: ["CMD-SHELL", "curl --silent --fail localhost:8500/v1/agent/services || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  core-demo-api1:
    environment:
      - ASPNETCORE_ENVIRONMENT=Compose
      - ASPNETCORE_URLS=http://+:80
      - HOSTNAME=core-demo-api1
    ports:
      - "55127:80"
    volumes: []
  core-demo-api2:
    environment:
      - ASPNETCORE_ENVIRONMENT=Compose
      - ASPNETCORE_URLS=http://+:80
      - HOSTNAME=core-demo-api2
    ports:
      - "55185:80"
    volumes: []
Consul JSON configuration (/etc/consul/services.json)
{
    "connect": {
        "enabled": true
    },
    "services": [
        {
            "name": "core-demo-api1",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "proxy": {  
                        "local_service_address": "127.0.0.1",
                        "local_service_port": 80,                      
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 19100
                             }
                        ]
                    },
                    "checks": [
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api1",
                            "alias_service": "core-demo-api1"
                        }
                    ]
                }
            }
        },
        {
            "name": "core-demo-api2",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "proxy": {        
                        "local_service_address": "127.0.0.1",
                        "local_service_port": 80,                
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api1",
                                "local_bind_port": 80
                            }
                        ]
                    },
                    "checks": [     
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api2",
                            "alias_service": "core-demo-api2"
                        }
                    ]
                }
            }
        }
    ]
}

@banks
Copy link
Member

banks commented Nov 5, 2018

Hey @mmisztal1980 as far as I understand network_mode should behave as you expected so I can't see an obvious reason here for this not working. My guess would be that Envoy is failing to get configured correctly so isn't setting up the upstream listener.

Can you try adding -- -l DEBUG to the end of the proxy command string - this is passed through to envoy and puts it into verbose logging mode. Then check the output of the Envoy process? It's possible it's failing to listen or not getting config delivered for some reason.

@mmisztal1980
Copy link
Author

mmisztal1980 commented Nov 5, 2018

Hi @banks ,

I've added the debug option (only on core-demo-api1-sidecar-proxy) as you've suggested, here's the output.

What caught my eye: core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s) . I guess this implies a config issue?

output
~/Projects/@cloud-technologies/core-pkg   feature/154 ●  docker-compose up
Creating network "core-pkg_default" with the default driver
Creating core-pkg_core-demo-api1_1 ... done
Creating core-pkg_consul_1         ... done
Creating core-pkg_core-demo-api2_1 ... done
Creating core-pkg_core-demo-api1-sidecar-proxy_1 ... done
Creating core-pkg_core-demo-api2-sidecar-proxy_1 ... done
Attaching to core-pkg_core-demo-api1_1, core-pkg_core-demo-api2_1, core-pkg_consul_1, core-pkg_core-demo-api1-sidecar-proxy_1, core-pkg_core-demo-api2-sidecar-proxy_1
core-demo-api1_1                | warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35]
core-demo-api1_1                |       No XML encryptor configured. Key {3b2a2dd9-bd56-487c-a657-7167a5f43c9d} may be persisted to storage in unencrypted form.
core-demo-api2_1                | warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35]
core-demo-api2_1                |       No XML encryptor configured. Key {e6147221-8d69-4138-9f14-39f3e421f03d} may be persisted to storage in unencrypted form.
core-demo-api1_1                | [17:16:36 WRN] No XML encryptor configured. Key {3b2a2dd9-bd56-487c-a657-7167a5f43c9d} may be persisted to storage in unencrypted form.
consul_1                        | BootstrapExpect is set to 1; this is the same as Bootstrap mode.
consul_1                        | bootstrap = true: do not enable unless necessary
core-demo-api2_1                | [17:16:36 WRN] No XML encryptor configured. Key {e6147221-8d69-4138-9f14-39f3e421f03d} may be persisted to storage in unencrypted form.
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.021][8][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled)
core-demo-api1_1                | Hosting environment: Compose
core-demo-api1_1                | Content root path: /app
core-demo-api1_1                | Now listening on: http://[::]:80
core-demo-api1_1                | Application started. Press Ctrl+C to shut down.
consul_1                        | ==> Starting Consul agent...
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.021][8][info][main] source/server/server.cc:204] statically linked extensions:
consul_1                        | ==> Consul agent running!
consul_1                        |            Version: 'v1.3.0'
consul_1                        |            Node ID: '2c130a6b-e96c-213e-6786-3b6f84134db8'
consul_1                        |          Node name: 'docker'
consul_1                        |         Datacenter: 'local' (Segment: '<all>')
consul_1                        |             Server: true (Bootstrap: true)
consul_1                        |        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
consul_1                        |       Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul_1                        |            Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul_1                        | 
consul_1                        | ==> Log data will now stream in as it occurs:
consul_1                        | 
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Using random ID "2c130a6b-e96c-213e-6786-3b6f84134db8" as node ID
consul_1                        |     2018/11/05 17:16:36 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:2c130a6b-e96c-213e-6786-3b6f84134db8 Address:127.0.0.1:8300}]
consul_1                        |     2018/11/05 17:16:36 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "")
consul_1                        |     2018/11/05 17:16:36 [INFO] serf: EventMemberJoin: docker.local 127.0.0.1
consul_1                        |     2018/11/05 17:16:36 [INFO] serf: EventMemberJoin: docker 127.0.0.1
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
core-demo-api2_1                | Hosting environment: Compose
consul_1                        |     2018/11/05 17:16:36 [INFO] consul: Adding LAN server docker (Addr: tcp/127.0.0.1:8300) (DC: local)
consul_1                        |     2018/11/05 17:16:36 [INFO] consul: Handled member-join event for server "docker.local" in area "wan"
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent/proxy: managed Connect proxy manager started
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: started state syncer
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Started gRPC server on [::]:8502 (tcp)
core-demo-api2_1                | Content root path: /app
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:206]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:209]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:212]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:215]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:217]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:219]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:222]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
consul_1                        |     2018/11/05 17:16:36 [WARN] raft: Heartbeat timeout from "" reached, starting election
consul_1                        |     2018/11/05 17:16:36 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2
consul_1                        |     2018/11/05 17:16:36 [DEBUG] raft: Votes needed: 1
consul_1                        |     2018/11/05 17:16:36 [DEBUG] raft: Vote granted from 2c130a6b-e96c-213e-6786-3b6f84134db8 in term 2. Tally: 1
consul_1                        |     2018/11/05 17:16:36 [INFO] raft: Election won. Tally: 1
consul_1                        |     2018/11/05 17:16:36 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state
consul_1                        |     2018/11/05 17:16:36 [INFO] consul: cluster leadership acquired
core-demo-api2_1                | Now listening on: http://[::]:80
core-demo-api2_1                | Application started. Press Ctrl+C to shut down.
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:225]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
consul_1                        |     2018/11/05 17:16:36 [INFO] consul: New leader elected: docker
consul_1                        |     2018/11/05 17:16:36 [INFO] connect: initialized CA with provider "consul"
consul_1                        |     2018/11/05 17:16:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:16:36 [INFO] consul: member 'docker' joined, marking health alive
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api1"
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api1-sidecar-proxy"
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api2"
consul_1                        |     2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api2-sidecar-proxy"
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:36 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:16:38 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:17:06 [DEBUG] http: Request GET /v1/agent/services (881.2µs) from=127.0.0.1:49874
consul_1                        |     2018/11/05 17:17:07 [DEBUG] http: Request GET /v1/agent/services (2.4678ms) from=172.27.0.3:35604
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.030][8][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.031][8][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s)
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.031][8][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s)
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.032][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s)
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.038][8][info][main] source/server/server.cc:452] starting main dispatch loop
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.041][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.044][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api2 during init
core-demo-api1-sidecar-proxy_1  | [2018-11-05 17:17:07.044][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters
consul_1                        |     2018/11/05 17:17:07 [DEBUG] http: Request GET /v1/agent/services (497.5µs) from=172.27.0.4:40410
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled)
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:204] statically linked extensions:
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:206]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:209]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:212]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:215]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:217]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:219]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:222]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:225]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.113][8][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.114][8][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s)
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.114][8][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s)
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.115][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.122][8][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s)
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.122][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.122][8][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.122][8][info][main] source/server/server.cc:452] starting main dispatch loop
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.124][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.126][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api1 during init
core-demo-api2-sidecar-proxy_1  | [2018-11-05 17:17:07.126][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters
consul_1                        |     2018/11/05 17:17:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:17:36 [DEBUG] http: Request GET /v1/agent/services (235µs) from=127.0.0.1:49964
consul_1                        |     2018/11/05 17:18:06 [DEBUG] http: Request GET /v1/agent/services (249.7µs) from=127.0.0.1:50046
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:18:19 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:18:36 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local)
consul_1                        |     2018/11/05 17:18:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:18:36 [DEBUG] http: Request GET /v1/agent/services (174.7µs) from=127.0.0.1:50130
consul_1                        |     2018/11/05 17:19:06 [DEBUG] http: Request GET /v1/agent/services (197.9µs) from=127.0.0.1:50210
consul_1                        |     2018/11/05 17:19:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:19:36 [DEBUG] http: Request GET /v1/agent/services (138µs) from=127.0.0.1:50290
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:19:46 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:20:06 [DEBUG] http: Request GET /v1/agent/services (285.8µs) from=127.0.0.1:50370
consul_1                        |     2018/11/05 17:20:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:20:36 [DEBUG] http: Request GET /v1/agent/services (613.9µs) from=127.0.0.1:50450
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:20:50 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:21:06 [DEBUG] http: Request GET /v1/agent/services (216.5µs) from=127.0.0.1:50530
consul_1                        |     2018/11/05 17:21:34 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local)
consul_1                        |     2018/11/05 17:21:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:21:37 [DEBUG] http: Request GET /v1/agent/services (151.7µs) from=127.0.0.1:50614
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:22:04 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:22:07 [DEBUG] http: Request GET /v1/agent/services (242.9µs) from=127.0.0.1:50694
consul_1                        |     2018/11/05 17:22:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:22:37 [DEBUG] http: Request GET /v1/agent/services (232.9µs) from=127.0.0.1:50774
consul_1                        |     2018/11/05 17:23:07 [DEBUG] http: Request GET /v1/agent/services (202.3µs) from=127.0.0.1:50854
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:23:15 [DEBUG] agent: Node info in sync
consul_1                        |     2018/11/05 17:23:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:23:37 [DEBUG] http: Request GET /v1/agent/services (179.8µs) from=127.0.0.1:50934
consul_1                        |     2018/11/05 17:24:07 [DEBUG] http: Request GET /v1/agent/services (411.6µs) from=127.0.0.1:51014
consul_1                        |     2018/11/05 17:24:08 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local)
consul_1                        |     2018/11/05 17:24:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small
consul_1                        |     2018/11/05 17:24:37 [DEBUG] http: Request GET /v1/agent/services (157.7µs) from=127.0.0.1:51094
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api1" in sync
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api2" in sync
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync
consul_1                        |     2018/11/05 17:24:43 [DEBUG] agent: Node info in sync

@banks
Copy link
Member

banks commented Nov 8, 2018

Yep does look like a config issue, sorry I forgot the detail but can you try one more time with -l trace - that will show you the full discovery request/resposne from Envoy which will help figure out where the misconfiguration is.

@mmisztal1980
Copy link
Author

Hi @banks , I've applied the setting you've suggested, here's the output.

https://gist.github.com/mmisztal1980/a0e36c0f1d1e277470cf318ceea64d04

@banks
Copy link
Member

banks commented Nov 9, 2018

@mmisztal1980 thanks.

I think I see the issue and it's potentially a bug combined with a config issue.

I think from that log what is happening is this:

  • core-demo-api1-sidecar-proxy (envoy1 from now on) is requesting Clusters and being delivers the upstream and the local_app as expected (so all gRPC and agent config is good - it's talking to agent fine)
  • It then tries to discover the current set of endpoints for the upstream service:core-demo-api2 and the agent apparently blocks on that request and never returns.
  • Envoy ADS is a bit smart - it won't even ask us for listener config until it has clusters and endpoints for those clusters to route traffic too so the listeners never even get requested and so never start. In this case.

So the question is why is Consul not delivering any results for service:core-demo-api2. I suspect it's because that service isn't yet registered and passing healthchecks but here is the (possible) bug: if it doesn't find healthy instances, it should return and empty list so the rest of the proxy can continue to be set up. That wouldn't make this work for you since the proxy still wouldn't know where to connect, but it would at least prevent one unhealthy upstream from delaying proxy bootstrap where all the other upstreams might be fine.

This should solve itself as soon as your upstream service comes up and is healthy though - can you confirm that it does (i.e. look in the Consul UI or query via DNS/HTTP to see that the secondary instance it available). Since the service has no health checks it should be healthy immediately and the proxy for it should be too (by it's alias check).

So the mystery here is why is that upstream service discovery not working correctly?

I'll try to reproduce with your docker compose file and config later to debug this more.

Thanks for your help digging on this!

@mmisztal1980
Copy link
Author

@banks The 2nd core-demo-api2 service is up and running and healthy.

@banks
Copy link
Member

banks commented Nov 9, 2018 via email

@mmisztal1980
Copy link
Author

mmisztal1980 commented Nov 9, 2018

Sure here it is,

curl -v consul:8500/v1/health/connect/core-demo-api1
# curl -v consul:8500/v1/health/connect/core-demo-api1
*   Trying 172.27.0.4...
* TCP_NODELAY set
* Connected to consul (172.27.0.4) port 8500 (#0)
> GET /v1/health/connect/core-demo-api1 HTTP/1.1
> Host: consul:8500
> User-Agent: curl/7.52.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Vary: Accept-Encoding
< X-Consul-Effective-Consistency: leader
< X-Consul-Index: 13
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Fri, 09 Nov 2018 23:19:58 GMT
< Transfer-Encoding: chunked
< 
[
    {
        "Node": {
            "ID": "2c130a6b-e96c-213e-6786-3b6f84134db8",
            "Node": "docker",
            "Address": "127.0.0.1",
            "Datacenter": "local",
            "TaggedAddresses": {
                "lan": "127.0.0.1",
                "wan": "127.0.0.1"
            },
            "Meta": {
                "consul-network-segment": ""
            },
            "CreateIndex": 5,
            "ModifyIndex": 5
        },
        "Service": {
            "Kind": "connect-proxy",
            "ID": "core-demo-api1-sidecar-proxy",
            "Service": "core-demo-api1-sidecar-proxy",
            "Tags": [],
            "Address": "",
            "Meta": null,
            "Port": 21000,
            "Weights": {
                "Passing": 1,
                "Warning": 1
            },
            "EnableTagOverride": false,
            "ProxyDestination": "",
            "Proxy": {
                "DestinationServiceName": "core-demo-api1",
                "DestinationServiceID": "core-demo-api1",
                "LocalServiceAddress": "127.0.0.1",
                "LocalServicePort": 80,
                "Upstreams": [
                    {
                        "DestinationType": "service",
                        "DestinationName": "code-demo-api2",
                        "Datacenter": "",
                        "LocalBindPort": 19100,
                        "Config": null
                    }
                ]
            },
            "Connect": {},
            "CreateIndex": 6,
            "ModifyIndex": 6
        },
        "Checks": [
            {
                "Node": "docker",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "ServiceTags": [],
                "Definition": {},
                "CreateIndex": 13,
                "ModifyIndex": 13
            },
            {
                "Node": "docker",
                "CheckID": "service:core-demo-api1-sidecar-proxy",
                "Name": "Connect Sidecar Aliasing core-demo-api1",
                "Status": "passing",
                "Notes": "",
                "Output": "No checks found.",
                "ServiceID": "core-demo-api1-sidecar-proxy",
                "ServiceName": "core-demo-api1-sidecar-proxy",
                "ServiceTags": [],
                "Definition": {},
                "CreateIndex": 6,
                "ModifyIndex": 6
            }
        ]
    }
]
* Curl_http_done: called premature == 0
* Connection #0 to host consul left intact
# 

curl -v consul:8500/v1/health/connect/core-demo-api2
# curl -v consul:8500/v1/health/connect/core-demo-api2
*   Trying 172.27.0.4...
* TCP_NODELAY set
* Connected to consul (172.27.0.4) port 8500 (#0)
> GET /v1/health/connect/core-demo-api2 HTTP/1.1
> Host: consul:8500
> User-Agent: curl/7.52.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Vary: Accept-Encoding
< X-Consul-Effective-Consistency: leader
< X-Consul-Index: 13
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Fri, 09 Nov 2018 23:18:04 GMT
< Transfer-Encoding: chunked
< 
[
    {
        "Node": {
            "ID": "2c130a6b-e96c-213e-6786-3b6f84134db8",
            "Node": "docker",
            "Address": "127.0.0.1",
            "Datacenter": "local",
            "TaggedAddresses": {
                "lan": "127.0.0.1",
                "wan": "127.0.0.1"
            },
            "Meta": {
                "consul-network-segment": ""
            },
            "CreateIndex": 5,
            "ModifyIndex": 5
        },
        "Service": {
            "Kind": "connect-proxy",
            "ID": "core-demo-api2-sidecar-proxy",
            "Service": "core-demo-api2-sidecar-proxy",
            "Tags": [],
            "Address": "",
            "Meta": null,
            "Port": 21001,
            "Weights": {
                "Passing": 1,
                "Warning": 1
            },
            "EnableTagOverride": false,
            "ProxyDestination": "",
            "Proxy": {
                "DestinationServiceName": "core-demo-api2",
                "DestinationServiceID": "core-demo-api2",
                "LocalServiceAddress": "127.0.0.1",
                "LocalServicePort": 80,
                "Upstreams": [
                    {
                        "DestinationType": "service",
                        "DestinationName": "code-demo-api1",
                        "Datacenter": "",
                        "LocalBindPort": 80,
                        "Config": null
                    }
                ]
            },
            "Connect": {},
            "CreateIndex": 9,
            "ModifyIndex": 9
        },
        "Checks": [
            {
                "Node": "docker",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "ServiceTags": [],
                "Definition": {},
                "CreateIndex": 13,
                "ModifyIndex": 13
            },
            {
                "Node": "docker",
                "CheckID": "service:core-demo-api2-sidecar-proxy",
                "Name": "Connect Sidecar Aliasing core-demo-api2",
                "Status": "passing",
                "Notes": "",
                "Output": "No checks found.",
                "ServiceID": "core-demo-api2-sidecar-proxy",
                "ServiceName": "core-demo-api2-sidecar-proxy",
                "ServiceTags": [],
                "Definition": {},
                "CreateIndex": 9,
                "ModifyIndex": 9
            }
        ]
    }
]
* Curl_http_done: called premature == 0
* Connection #0 to host consul left intact
# 

@banks
Copy link
Member

banks commented Nov 12, 2018

Hmm not sure why it causes the Envoy config failure it does but one issue I see there is that your agent nodes are registering with 127.0.0.1 as their address.

That means the service discovery results for the service in one container is going to return local IP which seems wrong - you need the IP of the other container to connect out to it.

Typically you'd have a local Consul agent on each node that is configured to bind to the public IP of that node and then this would work as expected - services by default are advertised at their agent's bind (or advertise) address.

In this setup where you don't have "local" agents on each "node" (i.e. each api container network namespace) you would need to register the services with an explicit Address set to their docker IP that other containers can connect to.

Alternatively you can get closer to a production simulation by starting an actual consul agent container for each "node" that also shares the api container's namespace. If you do that the arguments for http-addr etc shouldn't be needed as the proxy can connect to it's "local" agent just like a non-container setup on multiple hosts would do.

When I get a chance I still want to reproduce locally so I can figure out why Envoy hangs part way through config in this case. But let me know if that IP config helps.

@pearkes pearkes added the needs-investigation The issue described is detailed and complex. label Nov 28, 2018
@mmisztal1980
Copy link
Author

Hi @banks , it's been a while. I was wondering if you folks have made any progress investigating this, as I'll be giving it another spin soon (tm)

@banks
Copy link
Member

banks commented Feb 25, 2019

I'm pretty sure I know why the hanging bug happens but it may only be part of the story here.

The bug is due to this line:

consul/agent/xds/server.go

Lines 349 to 352 in c2a30c5

if resources == nil || len(resources) == 0 {
// Nothing to send yet
return nil
}

Basically assuming that if we didn't get a response yet (since this is all async internally) so part of the config is empty then we shouldn't bother sending it to the proxy. The problem is that in a case where there are legitimately no instances available (not registered yet or failing health checks) then we end up not sending the endpoints at all which Envoy hangs waiting for.

I think that's an easy fix, but based on your Curl output above I'm not really sure if it's the only issue going on with your setup.

banks added a commit that referenced this issue Feb 25, 2019
As noted in #4868 we can sometimes cause Envoy to hang if one or more upstream has no instances available since Envoy won't continue processing xDS for listeners until it has all the endpoints resolved they'd be proxying to.

I _assume_ that this is correct and that by getting an explicit empty result Envoy will continue to resolve the config and just fail with a 503 if that upstream is connected.

This needs testing though and also ensuring that it doesn't cause any other side-effects.
@stale
Copy link

stale bot commented Oct 21, 2019

Hey there,
We wanted to check in on this request since it has been inactive for at least 60 days.
If you think this is still an important issue in the latest version of Consul
or its documentation please reply with a comment here which will cause it to stay open for investigation.
If there is still no activity on this issue for 30 more days, we will go ahead and close it.

Feel free to check out the community forum as well!
Thank you!

@stale stale bot added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Oct 21, 2019
@stale
Copy link

stale bot commented Nov 20, 2019

Hey there, This issue has been automatically closed because there hasn't been any activity for at least 90 days. If you are still experiencing problems, or still have questions, feel free to open a new one 👍

@stale stale bot closed this as completed Nov 20, 2019
@ghost
Copy link

ghost commented Jan 25, 2020

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.

If you are still experiencing problems, or still have questions, feel free to open a new one 👍.

@ghost ghost locked and limited conversation to collaborators Jan 25, 2020
@ghost ghost removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs-investigation The issue described is detailed and complex.
Projects
None yet
Development

No branches or pull requests

4 participants