Backport of Use strict DNS for mesh gateways with hostnames into release/1.17.x #19396

hc-github-team-consul-core · 2023-10-26T20:08:07Z

Backport

This PR is auto-generated from #19268 to be assessed for backporting due to the inclusion of the label backport/1.17.

The below text is copied from the body of the original PR.

Description

This fixes #17557. In an attempt to support mesh gateways fronted by AWS load balancers, a code path for peered mesh gateways was introduced in #14917 that leverages envoy clusters backed by the LOGICAL_DNS cluster discovery type. Problematically, when replicas of mesh gateways exist in a peered connection, the dialing peer will hit this code path and attempt to add multiple endpoints for the targeted mesh gateways. Envoy, however, doesn't support multiple endpoints using LOGICAL_DNS and will start spitting out errors applying the xDS it receives from Consul.

In Kubernetes, when a mesh gateway restarts then, it will never finish initializing and get marked as healthy, so its pod will continually restart and the gateway becomes unusable.

Because this requires using hostnames rather than IP addresses for the WAN addresses registered for mesh gateways, it likely impacts mostly Consul users on AWS, where hostnames are used for LoadBalancer services and thus registered for LoadBalancer type mesh gateways. This will also affect users who manually (or with annotations) register mesh gateways with mutiple FQDNs.

Note that this appears to only affect the dialing cluster in a peered connection, the accepting clusters use a different code path that only ever uses a single mesh gateway target and doesn't attempt to load-balance between multiple mesh gateways.

Testing & Reproduction steps

I was able to recreate this pretty easily outside of AWS by pinning the FQDN of the mesh gateways in the accepting cluster via something like:

meshGateway:
  enabled: true
  replicas: 2
  wanAddress:
    source: "Static"
    static: "gateway.nanosleep.cloud"

which gives this for my dialing cluster:

curl https://${DC2_CONSUL}/v1/peerings ... | jq
...
"PeerServerAddresses": [
  "gateway.nanosleep.cloud:443",
  "gateway.nanosleep.cloud:443"
],
...

And dialing cluster Consul logs then show:

2023-10-17T22:37:29.016Z [ERROR] agent.envoy.xds.mesh_gateway: got error response from envoy proxy: service_id=default/default/consul-consul-mesh-gateway-6ff745887b-5c5s2 typeUrl=type.googleapis.com/envoy.config.cluster.v3.Cluster xdsVersion=v3 nonce=00000006 error="rpc error: code = Internal desc = Error adding/updating cluster(s) server.dc1.peering.303380e1-f1a6-fb04-4ca6-c562e4951539.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint"

and dialing cluster mesh gateway:

2023-10-17T22:36:20.838Z+00:00 [warning] envoy.config(14) gRPC config for type.googleapis.com/envoy.config.cluster.v3.Cluster rejected: Error adding/updating cluster(s) server.dc1.peering.303380e1-f1a6-fb04-4ca6-c562e4951539.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Swapping to STRICT_DNS allows the mesh gateway to finish configuration and boot properly.

Links

Strict DNS in envoy.

PR Checklist

updated test coverage
external facing docs updated
appropriate backport labels added
not a security concern

Overview of commits

e9eabcb - 013de0b

github-team-consul-core-pr-approver

Auto approved Consul Bot automated PR

Andrew Stucki added 2 commits October 18, 2023 14:25

backport of commit e9eabcb

2b12be1

backport of commit 013de0b

369da8e

hc-github-team-consul-core force-pushed the backport/net-4786/mesh-strict-dns/secondly-pleasant-thrush branch from 2b12be1 to d3e927a Compare October 26, 2023 20:08

hc-github-team-consul-core assigned andrewstucki Oct 26, 2023

hc-github-team-consul-core force-pushed the backport/net-4786/mesh-strict-dns/secondly-pleasant-thrush branch from 1a0bbf1 to 369da8e Compare October 26, 2023 20:08

hc-github-team-consul-core enabled auto-merge (squash) October 26, 2023 20:08

hc-github-team-consul-core requested a review from andrewstucki October 26, 2023 20:08

github-actions bot added the theme/envoy/xds Related to Envoy support label Oct 26, 2023

github-team-consul-core-pr-approver approved these changes Oct 26, 2023

View reviewed changes

hc-github-team-consul-core merged commit b7055a0 into release/1.17.x Oct 26, 2023
86 checks passed

hc-github-team-consul-core deleted the backport/net-4786/mesh-strict-dns/secondly-pleasant-thrush branch October 26, 2023 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of Use strict DNS for mesh gateways with hostnames into release/1.17.x #19396

Backport of Use strict DNS for mesh gateways with hostnames into release/1.17.x #19396

hc-github-team-consul-core commented Oct 26, 2023

github-team-consul-core-pr-approver left a comment

Backport of Use strict DNS for mesh gateways with hostnames into release/1.17.x #19396

Backport of Use strict DNS for mesh gateways with hostnames into release/1.17.x #19396

Conversation

hc-github-team-consul-core commented Oct 26, 2023

Backport

Description

Testing & Reproduction steps

Links

PR Checklist

github-team-consul-core-pr-approver left a comment

Choose a reason for hiding this comment