-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul Connect: Fails to register routes for Envoy? #4868
Comments
The getting started guide has the proxies and the consul agent run within the same network namespace with the docker flag In your example your sidecar proxies and consul agent do not share a network namespace, so the default gRPC server address will need to be explicitly configured like you did with the HTTP api address. The gRPC server serves the xDS protocol for envoy. In your example you would need to specify
I picked up on the misconfig because I saw that the |
Hi @rboyer , thanks for the hint, that seems to have helped. stdout
Can you elaborate on that behavior? Why is the gRPC setting required when not specifying a docker network? I seem to have missed that bit in the docs or it's not well described |
In a non-containerized setup a consul agent (running in lightweight client mode) and applications using that consul agent communicate over a localhost connection using the HTTP api. This avoids the problem of discovering your service discovery system by assuming you can simply communicate over a prearranged port ( As of v1.3, consul also listens on a gRPC port (defaults to The The getting started guide uses host networking for simplicity so that the consul agent and all of the envoy instances are free to communicate with each other directly over localhost. If these are not co-located then the defaults cannot be used and will have to be explicitly configured. |
Ok, this makes a ton of sense now. I've obtained another config dump, the proxy for core-demo-api1 seems to be aware of core-demo-api2 now: config_dump{
"configs": [
{
"@type": "type.googleapis.com/envoy.admin.v2alpha.BootstrapConfigDump",
"bootstrap": {
"node": {
"id": "core-demo-api1-sidecar-proxy",
"cluster": "core-demo-api1-sidecar-proxy",
"build_version": "5d25f466c3410c0dfa735d7d4358beb76b2da507/1.8.0/Clean/RELEASE"
},
"static_resources": {
"clusters": [
{
"name": "local_agent",
"connect_timeout": "1s",
"hosts": [
{
"socket_address": {
"address": "172.18.0.2",
"port_value": 8502
}
}
],
"http2_protocol_options": {}
}
]
},
"dynamic_resources": {
"lds_config": {
"ads": {}
},
"cds_config": {
"ads": {}
},
"ads_config": {
"api_type": "GRPC",
"grpc_services": [
{
"envoy_grpc": {
"cluster_name": "local_agent"
},
"initial_metadata": [
{
"key": "x-consul-token"
}
]
}
]
}
},
"admin": {
"access_log_path": "/dev/null",
"address": {
"socket_address": {
"address": "0.0.0.0",
"port_value": 19000
}
}
}
},
"last_updated": "2018-10-29T20:43:11.078Z"
},
{
"@type": "type.googleapis.com/envoy.admin.v2alpha.ClustersConfigDump",
"version_info": "00000001",
"static_clusters": [
{
"cluster": {
"name": "local_agent",
"connect_timeout": "1s",
"hosts": [
{
"socket_address": {
"address": "172.18.0.2",
"port_value": 8502
}
}
],
"http2_protocol_options": {}
},
"last_updated": "2018-10-29T20:43:11.080Z"
}
],
"dynamic_active_clusters": [
{
"version_info": "00000001",
"cluster": {
"name": "local_app",
"connect_timeout": "5s",
"hosts": [
{
"socket_address": {
"address": "127.0.0.1",
"port_value": 80
}
}
]
},
"last_updated": "2018-10-29T20:43:11.083Z"
},
{
"version_info": "00000001",
"cluster": {
"name": "service:core-demo-api2",
"type": "EDS",
"eds_cluster_config": {
"eds_config": {
"ads": {}
}
},
"connect_timeout": "5s",
"tls_context": {
"common_tls_context": {
"tls_params": {},
"tls_certificates": [
{
"certificate_chain": {
"inline_string": "-----BEGIN CERTIFICATE-----\nMIICoDCCAkWgAwIBAgIBDzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\nQ0EgNzAeFw0xODEwMjkyMDQxMzlaFw0xODExMDEyMDQxMzlaMBkxFzAVBgNVBAMT\nDmNvcmUtZGVtby1hcGkxMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEddEmcU3J\n93aVNkjVBacdp4YsfWVQQbDBAV2Vp6+x91L9ZenMfNYlka0PjMUnPlTLsXA+RygP\njU0KEezoMRzK96OCAX8wggF7MA4GA1UdDwEB/wQEAwIDuDAdBgNVHSUEFjAUBggr\nBgEFBQcDAgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADBoBgNVHQ4EYQRfYWY6NDE6\nNWU6Yzc6NDA6Yzc6OGU6MGE6MTk6MTk6MDI6ZDI6NDM6NDk6YjE6MDA6ZTc6YjA6\nN2E6MGM6MTQ6ZmQ6Mzg6MDU6ZjA6ZjQ6MTI6ODI6OTg6OTA6Y2Q6MTQwagYDVR0j\nBGMwYYBfYWY6NDE6NWU6Yzc6NDA6Yzc6OGU6MGE6MTk6MTk6MDI6ZDI6NDM6NDk6\nYjE6MDA6ZTc6YjA6N2E6MGM6MTQ6ZmQ6Mzg6MDU6ZjA6ZjQ6MTI6ODI6OTg6OTA6\nY2Q6MTQwZgYDVR0RBF8wXYZbc3BpZmZlOi8vOTg2YTI2YTAtMmY1ZS1jNTc2LWJh\nNTMtNmJhYWUyOTM1OGQ1LmNvbnN1bC9ucy9kZWZhdWx0L2RjL2xvY2FsL3N2Yy9j\nb3JlLWRlbW8tYXBpMTAKBggqhkjOPQQDAgNJADBGAiEA9ogH7GSXunQknJsqPeW3\n8yAVgSeifpcgj1x4CefQ9b4CIQDFUOZNrknZHdP5XtpnUI12mojZFOfOKZZTLU03\n/+7yEg==\n-----END CERTIFICATE-----\n"
},
"private_key": {
"inline_string": "-----BEGIN EC PRIVATE KEY-----\nMHcCAQEEICLItQyoIMOvBuXlFnPy7NHdOyCZvKyvs9XwoMLyVJ+GoAoGCCqGSM49\nAwEHoUQDQgAEddEmcU3J93aVNkjVBacdp4YsfWVQQbDBAV2Vp6+x91L9ZenMfNYl\nka0PjMUnPlTLsXA+RygPjU0KEezoMRzK9w==\n-----END EC PRIVATE KEY-----\n"
}
}
],
"validation_context": {
"trusted_ca": {
"inline_string": "-----BEGIN CERTIFICATE-----\nMIICWTCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\nQ0EgNzAeFw0xODEwMjkyMDQyMzlaFw0yODEwMjYyMDQyMzlaMBYxFDASBgNVBAMT\nC0NvbnN1bCBDQSA3MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAER2QBoP+HINzV\nJQn/rgRFVBGTgSrX9+qd9lK9U6y+dDvpT5PbPyO7kZSU0GCmAT/NjGoqIGtgW9pp\nHuOF8CxKgKOCATwwggE4MA4GA1UdDwEB/wQEAwIBhjAPBgNVHRMBAf8EBTADAQH/\nMGgGA1UdDgRhBF9hZjo0MTo1ZTpjNzo0MDpjNzo4ZTowYToxOToxOTowMjpkMjo0\nMzo0OTpiMTowMDplNzpiMDo3YTowYzoxNDpmZDozODowNTpmMDpmNDoxMjo4Mjo5\nODo5MDpjZDoxNDBqBgNVHSMEYzBhgF9hZjo0MTo1ZTpjNzo0MDpjNzo4ZTowYTox\nOToxOTowMjpkMjo0Mzo0OTpiMTowMDplNzpiMDo3YTowYzoxNDpmZDozODowNTpm\nMDpmNDoxMjo4Mjo5ODo5MDpjZDoxNDA/BgNVHREEODA2hjRzcGlmZmU6Ly85ODZh\nMjZhMC0yZjVlLWM1NzYtYmE1My02YmFhZTI5MzU4ZDUuY29uc3VsMAoGCCqGSM49\nBAMCA0gAMEUCIAE01fG9L9G90KtEfZIzVFrCrSEYnysQp5lZiyugWfBHAiEAh4CS\njy2F8MAE0gHy1WT4tLWV5PzxK7Wx4uboMbsST5Y=\n-----END CERTIFICATE-----\n"
}
}
}
}
},
"last_updated": "2018-10-29T20:43:11.084Z"
}
]
},
{
"@type": "type.googleapis.com/envoy.admin.v2alpha.ListenersConfigDump"
}
]
} I'm still getting 404s when calling core-demo-api2 from core-demo-api1 through the core-demo-api1-sidecar-proxy though. Should I use:
|
@mmisztal1980 your client apps should talk to whichever local port the upstream is listening on - unless you setup something I missed hostnames like For example your service definition for
This is saying "Please configure the proxy to listen on So the proxy will try to do just that but will (probably) collide since your actual service is already listening on port 80. The idea here is that you pick some port for each upstream that you want to expose the service on over loopback. The port number is arbitrary and the only thing that cares about it is your application. For example if you changed that definition to:
Then the proxy would listen on Note that this is only layer 4 (TCP/TLS) proxying so there are no HTTP paths or routes in the mix. L7 support will come later. Hope you get that working! |
@banks Thanks for clearing that up, now I understand the concept. I've reconfigured the core-demo-api1 registration: core-demo-api1 registration (proxy port for core-demo-api2 changed to 19100) {
"name": "core-demo-api1",
"port": 80,
"connect": {
"sidecar_service": {
"port": 19000,
"proxy": {
"upstreams": [
{
"destination_name": "code-demo-api2",
"local_bind_port": 19100
}
]
},
"checks": [
{
"name": "Connect Sidecar Listening",
"tcp": "core-demo-api1-sidecar-proxy:19000",
"interval": "10s"
},
{
"name": "Connect Sidecar Aliasing core-demo-api1",
"alias_service": "core-demo-api1"
}
]
}
}
} I've tried to execute a call vs my service's proxy, my client keeps throwing exceptions: I've signed onto the proxy container to examine whether or not, the port is being listened on:
I do not see port
|
I've just re-read the paragraph on Sidecar Service Registration. The upstreams use Is one transformed into the other? I think the docs are somewhat inconsistent here and don't really offer guidance on how to solve my above question :/ |
@mmisztal1980 can you clarify how you are running this again? Port 19000 happens to be the port we chose for the Envoy admin API (which you can't just disable) so iuf you are using Envoy here then it's not a great choice for your own stuff and might explain why you are seeing things listening on 19000 but not getting the response you expect. So I suspect you are not seeing any proxy running at all based on your netstat output. How are you starting the proxy? Is this in Kube? Can you include the output of the
There is a If you are trying to get this to work with docker compose I recommend using shared network namespaces at least between the app and sidecar containers rather than just exposing them over the docker bridge. This is easy in docker, start up the app first then start the proxy container using `--network "container:core-demo-api2-container-name" or equivalent in Compose. then the app and the proxy can talk over localhost just like in a kube pod (that is all Kube is doing under the hood). |
If the docker-compose file above is still roughly what you're using, I noticed another small plumbing issue. Because envoy will be listening on 127.0.0.1 (loopback) for exclusive outbound traffic access, the sidecars need to share the same network namespace with your app so it can connect. The way you have them configured above each of your 5 containers (consul, app1, sidecar1, app2, sidecar2) gets its networking stack complete with local ip address and personal isolated 127.0.0.1 address. For example, instead of:
You should have something like:
And similar plumbing for |
I'm also interested in knowing more about the lifespan of this POC. Is it mostly for kicking the tires of Connect in conjunction with your development stack or are you also planning on using something like docker compose to deploy a variation of the POC directly to a docker swarm cluster? |
My intention is to do a poc on k8s, however, .NET Solutions developed under Visual Studio 2017 have a docker-compose integration, so I wanted to get that up and running 1st. That way I get do debug/test the solution locally before I deploy it to an actual k8s cluster. In the long run, at my current company, we are working on decomposing a monolith and migrating it to a new hosting platform while retaining 100% uptime. Consul connect it v. interesting because it can provide a service mesh, and a consistent experience when services communicate with each other, regardless of the hosting platform. That's pretty much the genesis for the PoC. @rboyer I hope this satisfies your question? On a side note, this is also a brilliant opportunity to learn. |
adding |
@rboyer @banks I've added the I've modified my REST call inside HTTP Call from core-demo-api1 to core-demo-api2 via core-demo-api1-sidecar-proxyvar response = await httpClient.GetAsync("http://127.0.0.1:19100/api/values"); Exception message
docker-compose-ymlversion: '2.4'
services:
consul:
image: consul:1.3.0
command: "agent -dev -server -bootstrap-expect 1 -ui -client 0.0.0.0 -datacenter local -node docker -config-file /etc/consul/services.json"
volumes:
- ./demo/consul/services.json:/etc/consul/services.json
core-demo-api1:
image: core-demo-api1
build:
context: .
dockerfile: demo/Core.Demo.Api1/Dockerfile
core-demo-api1-sidecar-proxy:
image: consul-envoy
command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502"
network_mode: "service:core-demo-api1"
build:
context: ./docker/consul-envoy
dockerfile: ./Dockerfile
depends_on:
consul:
condition: service_healthy
core-demo-api2:
image: core-demo-api2
build:
context: .
dockerfile: demo/Core.Demo.Api2/Dockerfile
core-demo-api2-sidecar-proxy:
image: consul-envoy
command: "-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502"
network_mode: "service:core-demo-api2"
build:
context: ./docker/consul-envoy
dockerfile: ./Dockerfile
depends_on:
consul:
condition: service_healthy docker-compose.override.yml version: '2.4'
services:
consul:
ports:
- "8400:8400"
- "8500:8500"
- "8600:8600"
- "8600:8600/udp"
healthcheck:
test: ["CMD-SHELL", "curl --silent --fail localhost:8500/v1/agent/services || exit 1"]
interval: 30s
timeout: 30s
retries: 3
core-demo-api1:
environment:
- ASPNETCORE_ENVIRONMENT=Compose
- ASPNETCORE_URLS=http://+:80
- HOSTNAME=core-demo-api1
ports:
- "55127:80"
volumes: []
core-demo-api2:
environment:
- ASPNETCORE_ENVIRONMENT=Compose
- ASPNETCORE_URLS=http://+:80
- HOSTNAME=core-demo-api2
ports:
- "55185:80"
volumes: [] Consul JSON configuration (/etc/consul/services.json){
"connect": {
"enabled": true
},
"services": [
{
"name": "core-demo-api1",
"port": 80,
"connect": {
"sidecar_service": {
"proxy": {
"local_service_address": "127.0.0.1",
"local_service_port": 80,
"upstreams": [
{
"destination_name": "code-demo-api2",
"local_bind_port": 19100
}
]
},
"checks": [
{
"name": "Connect Sidecar Aliasing core-demo-api1",
"alias_service": "core-demo-api1"
}
]
}
}
},
{
"name": "core-demo-api2",
"port": 80,
"connect": {
"sidecar_service": {
"proxy": {
"local_service_address": "127.0.0.1",
"local_service_port": 80,
"upstreams": [
{
"destination_name": "code-demo-api1",
"local_bind_port": 80
}
]
},
"checks": [
{
"name": "Connect Sidecar Aliasing core-demo-api2",
"alias_service": "core-demo-api2"
}
]
}
}
}
]
} |
Hey @mmisztal1980 as far as I understand Can you try adding |
Hi @banks , I've added the debug option (only on core-demo-api1-sidecar-proxy) as you've suggested, here's the output. What caught my eye: output
|
Yep does look like a config issue, sorry I forgot the detail but can you try one more time with |
Hi @banks , I've applied the setting you've suggested, here's the output. https://gist.github.com/mmisztal1980/a0e36c0f1d1e277470cf318ceea64d04 |
@mmisztal1980 thanks. I think I see the issue and it's potentially a bug combined with a config issue. I think from that log what is happening is this:
So the question is why is Consul not delivering any results for This should solve itself as soon as your upstream service comes up and is healthy though - can you confirm that it does (i.e. look in the Consul UI or query via DNS/HTTP to see that the secondary instance it available). Since the service has no health checks it should be healthy immediately and the proxy for it should be too (by it's alias check). So the mystery here is why is that upstream service discovery not working correctly? I'll try to reproduce with your docker compose file and config later to debug this more. Thanks for your help digging on this! |
@banks The 2nd core-demo-api2 service is up and running and healthy. |
Out of interest what do you get if you curl consl like (replace with actual
host/ip etc).
curl -v consul:8500/v1/health/connect/core-demo-api2
and
curl -v consul:8500/v1/catalog/connect/core-demo-api2
…On Fri, Nov 9, 2018 at 2:32 PM Maciek Misztal ***@***.***> wrote:
@banks <https://github.com/banks> The 2nd *core-demo-api2* service is up
and running and healthy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4868 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHYU_5ztEQzXvcKVtB6XfC1gBGUBielks5utZHpgaJpZM4X_fw5>
.
|
Sure here it is, curl -v consul:8500/v1/health/connect/core-demo-api1
curl -v consul:8500/v1/health/connect/core-demo-api2
|
Hmm not sure why it causes the Envoy config failure it does but one issue I see there is that your agent nodes are registering with That means the service discovery results for the service in one container is going to return local IP which seems wrong - you need the IP of the other container to connect out to it. Typically you'd have a local Consul agent on each node that is configured to bind to the public IP of that node and then this would work as expected - services by default are advertised at their agent's bind (or advertise) address. In this setup where you don't have "local" agents on each "node" (i.e. each api container network namespace) you would need to register the services with an explicit Alternatively you can get closer to a production simulation by starting an actual consul agent container for each "node" that also shares the api container's namespace. If you do that the arguments for http-addr etc shouldn't be needed as the proxy can connect to it's "local" agent just like a non-container setup on multiple hosts would do. When I get a chance I still want to reproduce locally so I can figure out why Envoy hangs part way through config in this case. But let me know if that IP config helps. |
Hi @banks , it's been a while. I was wondering if you folks have made any progress investigating this, as I'll be giving it another spin soon (tm) |
I'm pretty sure I know why the hanging bug happens but it may only be part of the story here. The bug is due to this line: Lines 349 to 352 in c2a30c5
Basically assuming that if we didn't get a response yet (since this is all async internally) so part of the config is empty then we shouldn't bother sending it to the proxy. The problem is that in a case where there are legitimately no instances available (not registered yet or failing health checks) then we end up not sending the endpoints at all which Envoy hangs waiting for. I think that's an easy fix, but based on your Curl output above I'm not really sure if it's the only issue going on with your setup. |
As noted in #4868 we can sometimes cause Envoy to hang if one or more upstream has no instances available since Envoy won't continue processing xDS for listeners until it has all the endpoints resolved they'd be proxying to. I _assume_ that this is correct and that by getting an explicit empty result Envoy will continue to resolve the config and just fail with a 503 if that upstream is connected. This needs testing though and also ensuring that it doesn't cause any other side-effects.
Hey there, Feel free to check out the community forum as well! |
Hey there, This issue has been automatically closed because there hasn't been any activity for at least 90 days. If you are still experiencing problems, or still have questions, feel free to open a new one 👍 |
Hey there, This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days. If you are still experiencing problems, or still have questions, feel free to open a new one 👍. |
Overview of the Issue
I've created a PoC environment for 2x .NET Core 2.1 Services communicating via Consul-Connect.
The entire setup relies on a consul server instance, which uses a services.json file to perform the registrations in a 'static' way. If I understand the process correctly, the sidecar proxies should retrieve their configuration from the consul-server after starting up.
Once the consul server container is healthy, 2x sidecar-proxies start. At this point the entire setup is healthy:
When attempting to have core-demo-api1 call core-demo-api1 , I'm getting a 404 response.
I've exposed core-demo-api1-sidecar-proxy 's port 19000 and obtained a config dump in which I do not see any routes defined to core-demo-api2, which I believe is the root cause for the communication issue between the 2 services. I believe I've followed the available documentation to the letter, so my situation can be a potential bug.
Reproduction Steps
consul-envoy/Dockerfile
services.json
docker-compose.yml
docker-compose.override.yml
Consul info for both Client and Server
Client info
(!) Executed inside one of the sidecar-proxy containers (!)
envoy.yaml
Server info
Operating system and Environment details
0
Windows 10 Enterprise x64 running Docker for Windows (MobyLinuxVM)
Log Fragments
TBD
Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use
-log-level=TRACE
on the client and server to capture the maximum log detail.The text was updated successfully, but these errors were encountered: