-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip checking Vault health #8524
Conversation
Avoid checking if API is accessible, just make the API call and handle when it fails.
@@ -1239,7 +1229,7 @@ func (v *vaultClient) revokeDaemon() { | |||
case <-v.tomb.Dying(): | |||
return | |||
case now := <-ticker.C: | |||
if established, _ := v.ConnectionEstablished(); !established { | |||
if established, err := v.ConnectionEstablished(); !established || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code here was introduced in 42f7bc8#diff-87a4c76eca3de54716c55ba6b8c06475 - where established
is true only when there is no error. So checking error wasn't strictly necessary. After many modifications, this file consistently checks for established && err == nil
, but we missed this spot.
For some reason, not so clear to me, #3957 chose to represent invalid tokens as established
with true but non-nil error. It seems like we missed this spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Really appreciate the very fast work on this @notnoop ! |
Also - I tested this change out this morning and it completely fixes my issues! I am now dynamically retrieving secrets from Vault! |
Will this be included in next version of nomad? |
@adawalli yes, we are aiming to include this in 0.12.2 - but a bit slow in merging this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget the changelog entry. Probably make it about the linked issue more than the code change?
Nomad servers no longer rely on Vault's /sys/init API as hardened Vault installations may restrict access to that endpoint [GH-8524]
or something similar?
33604ab
to
a3b4f06
Compare
continue OUTER | ||
} | ||
initStatus = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schmichael Can I get a second review please? I have restored the health check because of the way CreateToken handles connection errors:
Lines 951 to 956 in a3b4f06
// Check if we have established a connection with Vault | |
if established, err := v.ConnectionEstablished(); !established && err == nil { | |
return nil, structs.NewRecoverableError(fmt.Errorf("Connection to Vault has not been established"), true) | |
} else if err != nil { | |
return nil, err | |
} |
In the above code, we aim to detect if the server failed to make any API contact, vault will return a recoverable error, but if /sys/init (or health now) succeeds, then it marks the token failures as permenant.
I don't buy into this distinction much, but to be conservative in a point release, I brought back the health check. However, I made reversed the order of the check - only check health if parseToken failed. It should be functionally the same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead make errors from parseSelfToken recoverable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't buy into the distinction but I'm afraid of making of any unintended side-effects considering it seems it was explicitly designed this way and this file has bitten as before. I'm more inclined to keep the change conservative for 0.12.X and make more ambitious refactor in a major release.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
To check if token is valid, we should simply check if the API valid, without needing to check if Vault is healthy/initialized. If it's not, we'll get a proper error.
Back in the day, we used to distinguish between errors related to Vault not initializing (where we'd retry) and errors due to token being invalid (where treated such errors as permanent). However, that changed in #3957 . In this world of retries, it doesn't really make sense to do the pre-check.
Fixes #8512