Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault health check fails with "unsupported path" with namespace set #13710

Closed
t-davies opened this issue Jul 12, 2022 · 7 comments
Closed

Vault health check fails with "unsupported path" with namespace set #13710

t-davies opened this issue Jul 12, 2022 · 7 comments

Comments

@t-davies
Copy link
Contributor

t-davies commented Jul 12, 2022

Nomad version

Output from nomad version
Nomad v1.3.1 (2b054e38e91af964d1235faa98c286ca3f527e56)

Operating system and Environment details

Amazon Linux 2, amd64, EC2

Issue

  • Calls to /v1/sys/health should not be made with the X-Vault-Namespace header set, doing so results in a 404 status code response from Vault, see sys: don't set X-Vault-Namespace header for root-only paths  vault#14934.
  • When a Vault namespace is configured for Nomad, the call to check health is made with the namespace configured - this results in a 404 status back, at least from new versions of Vault.
  • As a result Nomad considers the Vault connection unhealthy.

Reproduction steps

  • Run Vault cluster at version 1.11.0+ent
  • Run Nomad cluster at version 1.3.1
  • Configure Nomad with a Vault namespace
  • Wait for Nomad to attempt a Vault healtcheck
  • Observe the health check fails with this error
| Error making API request.
  | 
  | Namespace: xxxx
  | URL: GET https://vault.xxxxxxxxxxxxxx/v1/sys/health?drsecondarycode=299&performancestandbycode=299&sealedcode=299&standbycode=299&uninitcode=299
  | Code: 404. Errors:
  | 
  | * unsupported path

Expected Result

  • Health check is made without X-Vault-Namespace header and returns successfully.

Actual Result

  • Health check is made with X-Vault-Namespace header and fails.

Job file (if appropriate)

Nomad Server logs (if appropriate)

2022-07-12T09:01:45.853Z [WARN]  nomad.vault: failed to contact Vault API: retry=30s
  error=
  | Error making API request.
  | 
  | Namespace: xxxx
  | URL: GET https://vault.xxxxxxxxxxxxx/v1/sys/health?drsecondarycode=299&performancestandbycode=299&sealedcode=299&standbycode=299&uninitcode=299
  | Code: 404. Errors:
  | 
  | * unsupported path

Nomad Client logs (if appropriate)

Same error on clients.

@t-davies
Copy link
Contributor Author

t-davies commented Jul 12, 2022

From reading some more it seems like this should never happen, since there is the clientSys client that makes the calls to .Sys().Health() which is not namespaced to avoid this exact issue - so I think I may have been a bit naïve with the repro steps.

Nomad's mutual TLS certificates expired, as the intermediate cert - also stored in Vault - expired and they therefore couldn't renew. This was fixed and then we started seeing failing health checks.

We worked around the health check issue by spoofing a healthy response from Vault, after which it became apparent that Nomad's Vault token had expired since it was unable to keep up with the periodic renewals.

@mitchfriedman
Copy link

Just bumped into the same bug - also, hi @t-davies :)

@t-davies
Copy link
Contributor Author

Haha, hey @mitchfriedman - small world! 😃

@tgross
Copy link
Member

tgross commented Jul 25, 2022

Hi @t-davies! As you've noted, Nomad uses two Vault clients. There's one for namespace operations and one for non-namespaced operations (ref vault.go#L176-L186). And I'm looking at the code and it sure looks to me like we're setting the namespace correctly at vault.go#L445-L457:

// Store the client, create/assign the /sys client
v.client = client
if v.config.Namespace != "" {
	v.logger.Debug("configuring Vault namespace", "namespace", v.config.Namespace)
	v.clientSys, err = vapi.NewClient(apiConf)
	if err != nil {
		v.logger.Error("failed to create Vault sys client and not retrying", "error", err)
		return err
	}
	client.SetNamespace(v.config.Namespace)
} else {
	v.clientSys = client
}

It's not clear to me the current state of the issue. Once you got the certificates and tokens fixed and Nomad servers restarted with those new configurations, is the problem persisting? Also, can you provide the vault configuration you're using?

@t-davies
Copy link
Contributor Author

Thanks @tgross!

It's not clear to me the current state of the issue. Once you got the certificates and tokens fixed and Nomad servers restarted with those new configurations, is the problem persisting?

Sorry, yes - once certificates and tokens were sorted and we restarted the servers, everything became healthy again. Haven't seen this issue reoccur since. Not sure if we managed to get things into some sort of odd state given the other issues.

Here's the vault configuration we have:

[...]

vault {
  enabled   = true
  address   = "xxxx"
  namespace = "xxxx"
  
  allow_unauthenticated = false
  create_from_role      = "nomad-cluster"
}

[...]

@tgross
Copy link
Member

tgross commented Jul 26, 2022

Ok, good to hear. I suspect what's happening here is that Vault is doing the fairly common thing of returning a 404 on the resource when we don't have access to it, which was because of the expired Vault token. I'm going to close this issue out, but certainly if you run into this again and it's not because of an expired token, let us know. Thanks!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

3 participants