-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Un-break templates when using vault stanza change_mode noop #11783
Conversation
Templates in nomad jobs make use of the vault token defined in the vault stanza when issuing credentials like client certificates. When using change_mode "noop" in the vault stanza, consul-template is not informed in case a vault token is re-issued (which can happen from time to time for various reasons, as described in https://www.nomadproject.io/docs/job-specification/vault). As a result, consul-template will keep using the old vault token to renew credentials and - once the token expired - stop renewing credentials. The symptom of this problem is a vault_token file that is newer than the issued credential (e.g., TLS certificate) in a job's /secrets directory. This change corrects this, so that h.updater.updatedVaultToken(token) is called, which will inform stakeholders about the new token and make sure, the new token is used by consul-template. Example job template fragment: vault { policies = ["nomad-job-policy"] change_mode = "noop" } template { data = <<-EOH {{ with secret "pki_int/issue/nomad-job" "common_name=myjob.service.consul" "ttl=90m" "alt_names=localhost" "ip_sans=127.0.0.1"}} {{ .Data.certificate }} {{ .Data.private_key }} {{ .Data.issuing_ca }} {{ end }} EOH destination = "${NOMAD_SECRETS_DIR}/myjob.crt" change_mode = "noop" } This fix does not alter the meaning of the three change modes of vault - "noop" - Take no action - "restart" - Restart the job - "signal" - send a signal to the task as the switch statement following line 232 contains the necessary logic. It is assumed that "take no action" was never meant to mean "don't tell consul-template about the new vault token". Successfully tested in a staging cluster consisting of multiple nomad client nodes.
@DerekStrickland is there any additional information required at this point? |
@grembo No additional information is needed at this time. We just need to review this internally and think through any implications or side-effects it might have. I also need to take a look at what tests failed and why. We are very grateful that you took time to research this and submit a PR. I mean super grateful! Thanks so much for taking time to do this. I'll bring this up with the team and get you feedback as soon as I can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @grembo! In this scenario how does the workload actually get informed about the new secret? Is this just a case where the workload is starting a process from scratch regularly to pick up the change? (ex. a CGI script or something)
In any case, the behavior you're looking for here certainly seems to be implied by the docs and by this block in the loop:
if updatedToken {
switch h.vaultStanza.ChangeMode {
...
case structs.VaultChangeModeNoop:
fallthrough
Why have the fallthrough if it's never meant to be called with that value, right? But after the fallthrough we're calling the updatedVaultToken
method which looks like this:
func (tr *TaskRunner) updatedVaultToken(token string) {
// Update the task runner and environment
tr.setVaultToken(token)
// Trigger update hooks with the new Vault token
tr.triggerUpdateHooks()
}
That last line to trigger the update hooks is worrisome, as it means we'd be potentially triggering task updates to Consul registered services and script checks as well. An example corner case here is if a client has restarted with new environment variables, the task environment for interpolation may be different, but this previously wouldn't normally trigger an update. With this change a service could be changed even for change_mode = "noop"
Hi @tgross, Thank you for responding so quickly.
In this specific scenario the workload isn't interested in the vault token at all (ideally, it wouldn't even show up in /secrets), but only in the resulting TLS certificate. In order to issue and renew it, a vault stanza has to be present. For the workload I tested with, I actually pick up the new certificate simply by watching the file (myjob.crt) as I can't use signals in a meaningful way and restarting isn't an option either. If I'm not mistaken, the same logic applies also for other change_modes in the template stanza (e.g., signal) though, as long as the vault stanza contains change_mode noop.
This is exactly what made me opt for removing the if statement instead of adding code in another place :)
This is a valid point, but sounds like it is something users of I can't tell if anybody out there depends on vault change_mode's exact current behavior (e.g., just create vault tokens in /secrets and pick those up and restarting nomad client with an altered environment in the process). If this is a major concern, another change_mode could be introduced to address the situation (e.g., named "propagate") and documentation for change_mode noop could be adapted to discourage using it in combination with noop, maybe ultimately deprecating the combination, as it is inherently unreliable. Update: All of this assuming that the call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the thorough reply @grembo. I'm convinced.
LGTM 👍 We'll get this merged and ship it in the upcoming scheduled release.
@tgross Perfect, thank you! |
Templates in nomad jobs make use of the vault token defined in the vault stanza when issuing credentials like client certificates. When using change_mode "noop" in the vault stanza, consul-template is not informed in case a vault token is re-issued (which can happen from time to time for various reasons, as described in https://www.nomadproject.io/docs/job-specification/vault). As a result, consul-template will keep using the old vault token to renew credentials and - once the token expired - stop renewing credentials. The symptom of this problem is a vault_token file that is newer than the issued credential (e.g., TLS certificate) in a job's /secrets directory. This change corrects this, so that h.updater.updatedVaultToken(token) is called, which will inform stakeholders about the new token and make sure, the new token is used by consul-template. Example job template fragment: vault { policies = ["nomad-job-policy"] change_mode = "noop" } template { data = <<-EOH {{ with secret "pki_int/issue/nomad-job" "common_name=myjob.service.consul" "ttl=90m" "alt_names=localhost" "ip_sans=127.0.0.1"}} {{ .Data.certificate }} {{ .Data.private_key }} {{ .Data.issuing_ca }} {{ end }} EOH destination = "${NOMAD_SECRETS_DIR}/myjob.crt" change_mode = "noop" } This fix does not alter the meaning of the three change modes of vault - "noop" - Take no action - "restart" - Restart the job - "signal" - send a signal to the task as the switch statement following line 232 contains the necessary logic. It is assumed that "take no action" was never meant to mean "don't tell consul-template about the new vault token". Successfully tested in a staging cluster consisting of multiple nomad client nodes.
Templates in nomad jobs make use of the vault token defined in the vault stanza when issuing credentials like client certificates. When using change_mode "noop" in the vault stanza, consul-template is not informed in case a vault token is re-issued (which can happen from time to time for various reasons, as described in https://www.nomadproject.io/docs/job-specification/vault). As a result, consul-template will keep using the old vault token to renew credentials and - once the token expired - stop renewing credentials. The symptom of this problem is a vault_token file that is newer than the issued credential (e.g., TLS certificate) in a job's /secrets directory. This change corrects this, so that h.updater.updatedVaultToken(token) is called, which will inform stakeholders about the new token and make sure, the new token is used by consul-template. Example job template fragment: vault { policies = ["nomad-job-policy"] change_mode = "noop" } template { data = <<-EOH {{ with secret "pki_int/issue/nomad-job" "common_name=myjob.service.consul" "ttl=90m" "alt_names=localhost" "ip_sans=127.0.0.1"}} {{ .Data.certificate }} {{ .Data.private_key }} {{ .Data.issuing_ca }} {{ end }} EOH destination = "${NOMAD_SECRETS_DIR}/myjob.crt" change_mode = "noop" } This fix does not alter the meaning of the three change modes of vault - "noop" - Take no action - "restart" - Restart the job - "signal" - send a signal to the task as the switch statement following line 232 contains the necessary logic. It is assumed that "take no action" was never meant to mean "don't tell consul-template about the new vault token". Successfully tested in a staging cluster consisting of multiple nomad client nodes.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Templates in nomad jobs make use of the vault token defined in
the vault stanza when issuing credentials like client certificates.
When using change_mode "noop" in the vault stanza, consul-template
is not informed in case a vault token is re-issued (which can
happen from time to time for various reasons, as described
in https://www.nomadproject.io/docs/job-specification/vault).
As a result, consul-template will keep using the old vault token
to renew credentials and - once the token expired - stop renewing
credentials. The symptom of this problem is a vault_token
file that is newer than the issued credential (e.g., TLS certificate)
in a job's /secrets directory.
This change corrects this, so that h.updater.updatedVaultToken(token)
is called, which will inform stakeholders about the new
token and make sure, the new token is used by consul-template.
Example job template fragment:
This fix does not alter the meaning of the three change modes of vault
as the switch statement following line 232 contains the necessary
logic.
It is assumed that "take no action" was never meant to mean "don't tell
consul-template about the new vault token".
Successfully tested in a staging cluster consisting of multiple
nomad client nodes.