`publishNotReadyAddresses` only for headless not main service #817

der-eismann · 2022-12-07T15:18:44Z

Describe the bug
While migrating our Vault deployment to using Helm charts I got a bit confused by your use of publishNotReadyAddresses in the services. Basically you have two choices:

enabled (default): Your vault.default.svc.cluster.local (or vault-active) service can have unready pods attached which means requests to the service can fail. This is bad because some integrations like spring-cloud-vault still have no retry mechanism and will fail.
disabled: Cluster formation will probably fail because unready pods are not available via the vault-internal headless service

IMHO only the headless service should have publishNotReadyAddresses enabled, all other services should have it disabled. So I'd either want it to be configurable per service or fix it for some of the services.

To Reproduce
Steps to reproduce the behavior:

Install chart in HA mode
Send requests to the default vault k8s service
When one or multiple pods are unready it can happen that the requests go to the unready pod and return a server error or time out

Expected behavior
Requests to the default vault k8s service should only go to ready pods without it affecting the cluster formation via the headless service.

Environment

Kubernetes version: 1.23.13
- Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.): self-hosted
- Other configuration options or runtime services (istio, etc.): istio
vault-helm version: 0.23.0

Chart values:

server:
  enabled: true
  service:
    publishNotReadyAddresses: false

The text was updated successfully, but these errors were encountered:

Matthias247 · 2023-08-03T21:06:58Z

this seems addressed by #902 and the 0.25 release?

tomhjp · 2023-08-04T12:46:52Z

Thanks for pointing this out - that's correct! Thanks for the detailed issue report too @der-eismann!

rgarcia89 · 2024-02-12T21:30:11Z

I would like to add a finding to this topic. Since it seem like the publishNotReadyAddresses is set to true for the vault-internal service because of this issue.

I am running a Kubernetes cluster with a default deny-all network policy, allowing only whitelisted connections. For the Vault cluster consisting of 3 pods, I have enabled communication between them on ports 8200 and 8201. Everything is functioning correctly thus far. However, I am logging every detected denial, and here arises an issue due to the fact that the vault-internal service is also publishing non-ready pods. Unfortunately, when a pod enters the termination state, it is not promptly removed from the headless service. Consequently, other Vault pods continue attempting to establish connections with it. This persists until the pod is ultimately terminated, resulting in logged denials since the target pod is removed not fast enough from the service. Something that normally would happen as soon as a pod goes into termination state.

der-eismann added the bug Something isn't working label Dec 7, 2022

tomhjp closed this as completed Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`publishNotReadyAddresses` only for headless not main service #817

`publishNotReadyAddresses` only for headless not main service #817

der-eismann commented Dec 7, 2022

Matthias247 commented Aug 3, 2023

tomhjp commented Aug 4, 2023

rgarcia89 commented Feb 12, 2024

publishNotReadyAddresses only for headless not main service #817

publishNotReadyAddresses only for headless not main service #817

Comments

der-eismann commented Dec 7, 2022

Matthias247 commented Aug 3, 2023

tomhjp commented Aug 4, 2023

rgarcia89 commented Feb 12, 2024

`publishNotReadyAddresses` only for headless not main service #817

`publishNotReadyAddresses` only for headless not main service #817