Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service checks fails when exposing two paths for the same port in "consul connect"-enabled service (with transparent_proxy) #25262

Open
Yokutto opened this issue Mar 2, 2025 · 0 comments
Labels

Comments

@Yokutto
Copy link

Yokutto commented Mar 2, 2025

Nomad version (client)

Nomad v1.9.6
BuildDate 2025-02-11T18:55:10Z
Revision 7f8b449+CHANGES

Nomad version (server)

Nomad v1.9.5
BuildDate 2025-01-14T18:35:12Z
Revision 0b7bb8b+CHANGES

Issue

In Nomad, to make checks work with Consul Connect, you need to use the export stanza (or set export = true in checks). From my tests, this doesn't actually fully expose the endpoint, it still goes through the Envoy sidecar, but it bypasses the need for mutual TLS and namespace isolation. This lets other services (like Consul healthcheck) reach the endpoint without being in the Consul service mesh.

I've been using this mechanism to get Consul healthcheck working with my services, and according to the documentation, you can expose two different paths (even with different protocols) on the same port in a Consul Connect–enabled service (with transparent_proxy).

That's when the issue occurs: on the Envoy sidecar, the task hook fails to bootstrap and throws an error without any details:

Task hook failed: envoy_bootstrap: error creating bootstrap configuration for Connect proxy sidecar: exit status 1; see: <https://developer.hashicorp.com/nomad/s/envoy-bootstrap-error>

Note that this only happens when you're exposing two different paths using proxy.expose.path as shown in my job file below.

It's weird because the issue seems to occur more often when the service is updating or recycling replicas. I was able to reproduce it when creating a job, but it feels inconsistent, almost like it might be a race condition.

Reproduction steps

  1. Run the job with the definition listed at bottom of this issue:
    $ nomad job run healthcheck.nomad.hcl

Expected Result

I expect Envoy to bootstrap correctly and expose both paths on the same port, as described in the documentation.

Actual Result

Envoy fails to bootstrap due to an error in its configuration. I don't have access to the generated configuration right now, so I can't provide further details.

Job file

job "healthcheck" {
  datacenters = ["dc1"]

  group "hashicorp" {
    network {
      mode = "bridge"

      port "healthcheck-http" {
        to = 5678
      }
    }

    service {
      connect {
        sidecar_service {
          proxy {
            transparent_proxy {}

            expose {
              path {
                path            = "/health"
                protocol        = "http"
                local_path_port = 5678
                listener_port   = "healthcheck-http"
              }

              path {
                path            = "/ready"
                protocol        = "http"
                local_path_port = 5678
                listener_port   = "healthcheck-http"
              }
            }
          }
        }
      }

      # Liveness
      check {
        type     = "http"
        path     = "/health"
        port     = "healthcheck-http"
        interval = "1s"
        timeout  = "3s"

        check_restart {
          limit = 3
          grace = "5s"
        }
      }

      # Readiness
      check {
        type     = "http"
        path     = "/ready"
        port     = "healthcheck-http"
        interval = "5s"
        timeout  = "3s"
      }
    }

    task "echo" {
      driver = "docker"

      config {
        image = "docker.io/hashicorp/http-echo:latest"
        args = [
          "-text=Hi!"
        ]
      }
    }
  }
}

Extras

I've reproduced this problem even with a "naked" Consul, without any complex service configurations or additional settings. I'm also not sure whether this issue should be reported to the Nomad or Consul repository, as it appears to be more of a Consul Connect issue rather than a Nomad one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

1 participant