Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad validate does not recognize VAULT_TOKEN #13062

Closed
EtienneBruines opened this issue May 19, 2022 · 5 comments · Fixed by #13070
Closed

nomad validate does not recognize VAULT_TOKEN #13062

EtienneBruines opened this issue May 19, 2022 · 5 comments · Fixed by #13070
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug
Milestone

Comments

@EtienneBruines
Copy link
Contributor

EtienneBruines commented May 19, 2022

Nomad version

Nomad v1.3.0 (52e95d64113e01be05d585d8b4c07f6f19efebbc)

Things worked fine on Nomad 1.2.6, but ever since upgrading to 1.3.0 we've had this issue. Intermittently (The Intermittently was due to a data-race issue where the Nomad tokens issued by Vault weren't valid yet and the nomad validate therefore skipped checking completely. The nomad validate command ignores my VAULT_TOKEN values 100% of the time)

Operating system and Environment details

Ubuntu 21.10 Impish Indri on amd64

Issue

Unable to validate jobs with a vault.policies value.

Reproduced on multiple different machines.

Reproduction steps

Use the example job file below.

VAULT_TOKEN="enter-root-token-here" nomad validate job.nomad

The problem is not limited to using the vault root token, but also occurs with other valid tokens. Rubbish tokens (e.g. typing abc as a token) also give the same error.

It is later picked up on by nomad run - but somehow the nomad validate seems to completely ignore the value I provide.

Expected Result

Everything OK.

Actual Result

Job validation errors:
1 error occurred:
        * Vault used in the job but missing Vault token

Job file (if appropriate)

job "test1" {
  datacenters = ["dc1"]

  group "test2" {
    task "test3" {
      driver = "docker"

      config {
        image = "alpine:latest"
      }

      vault {
        policies = ["my-policy"]
      }
    }
  }
}

Nomad Server logs (if appropriate)

2022-05-19T08:03:50.906Z [TRACE] nomad.job: job mutate results: mutator=canonicalize warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job mutate results: mutator=connect warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job mutate results: mutator=expose-check warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job mutate results: mutator=constraints warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job validate results: validator=connect warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job validate results: validator=expose-check warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job validate results: validator=vault warnings=[] error="Vault used in the job but missing Vault token"
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job validate results: validator=namespace-constraint-check warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job validate results: validator=validate warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [TRACE] nomad.job: job validate results: validator=memory_oversubscription warnings=[] error=<nil>
2022-05-19T08:03:50.907Z [DEBUG] http: request complete: method=PUT path=/v1/validate/job?region=dc1 duration=29.542647ms

Nomad Client logs (if appropriate)

Request made by nomad cli to the Nomad Server

{
    "Job": {
        "Region": null,
        "Namespace": "services",
        "ID": "test1",
        "Name": "test1",
        "Type": null,
        "Priority": null,
        "AllAtOnce": null,
        "Datacenters": [
            "ham1"
        ],
        "Constraints": null,
        "Affinities": null,
        "TaskGroups": [
            {
                "Name": "test2",
                "Count": null,
                "Constraints": null,
                "Affinities": null,
                "Tasks": [
                    {
                        "Name": "test3",
                        "Driver": "docker",
                        "User": "",
                        "Lifecycle": null,
                        "Config": {
                            "image": "alpine:latest"
                        },
                        "Constraints": null,
                        "Affinities": null,
                        "Env": null,
                        "Services": null,
                        "Resources": null,
                        "RestartPolicy": null,
                        "Meta": null,
                        "KillTimeout": null,
                        "LogConfig": null,
                        "Artifacts": null,
                        "Vault": {
                            "Policies": [
                                "my-policy"
                            ],
                            "Namespace": null,
                            "Env": true,
                            "ChangeMode": "restart",
                            "ChangeSignal": null
                        },
                        "Templates": null,
                        "DispatchPayload": null,
                        "VolumeMounts": null,
                        "Leader": false,
                        "ShutdownDelay": 0,
                        "KillSignal": "",
                        "Kind": "",
                        "ScalingPolicies": null
                    }
                ],
                "Spreads": null,
                "Volumes": null,
                "RestartPolicy": null,
                "ReschedulePolicy": null,
                "EphemeralDisk": null,
                "Update": null,
                "Migrate": null,
                "Networks": null,
                "Meta": null,
                "Services": null,
                "ShutdownDelay": null,
                "StopAfterClientDisconnect": null,
                "MaxClientDisconnect": null,
                "Scaling": null,
                "Consul": null
            }
        ],
        "Update": null,
        "Multiregion": null,
        "Spreads": null,
        "Periodic": null,
        "ParameterizedJob": null,
        "Reschedule": null,
        "Migrate": null,
        "Meta": null,
        "ConsulToken": null,
        "VaultToken": null,
        "Stop": null,
        "ParentID": null,
        "Dispatched": false,
        "DispatchIdempotencyToken": null,
        "Payload": null,
        "ConsulNamespace": null,
        "VaultNamespace": null,
        "NomadTokenID": null,
        "Status": null,
        "StatusDescription": null,
        "Stable": null,
        "Version": null,
        "SubmitTime": null,
        "CreateIndex": null,
        "ModifyIndex": null,
        "JobModifyIndex": null
    },
    "Region": "",
    "Namespace": "",
    "SecretID": ""
}

Potentially related commits

7af0c3c

@EtienneBruines
Copy link
Contributor Author

EtienneBruines commented May 19, 2022

nomad plan suffers from the same issue:

Actual result

$ nomad plan job.nomad 
Error during plan: Unexpected response code: 500 (rpc error: 1 error occurred:
        * Vault used in the job but missing Vault token

)

Nomad Server logs

2022-05-19T08:43:40.910Z [ERROR] http: request failed: method=PUT path="/v1/job/test1/plan?namespace=services®ion=ham1"
  error=
  | rpc error: 1 error occurred:
  | 	* Vault used in the job but missing Vault token
  | 
   code=500
2022-05-19T08:43:40.910Z [DEBUG] http: request complete: method=PUT path="/v1/job/test1/plan?namespace=services®ion=ham1" duration=1.377973ms

That makes me wonder: how safe it is, to deploy a job file without either validate or plan? Could it mess with existing allocations, or would it do some internal validation beforehand anyways?

@shoenig
Copy link
Contributor

shoenig commented May 19, 2022

Thanks for the report @EtienneBruines, indeed this is a bug. Previously the vault validation was not happening except during job run, where the vault token was extracted from environment variable. But it was then refactored to be more like our other validators, but in a code path where the vault token is not being extracted from the environment variable.

@michalgm
Copy link

This bug still exists for nomad plan in 1.3.2:

🌩 [iad1:savagecloud-monitoring] gmichalec@gmichalec:~/work/savagecloud-monitoring$ ./nomad version
Nomad v1.3.2+ent (c0cc8626b2284ff7cb8e2aaa2d7d6943bb376574)
🌩 [iad1:savagecloud-monitoring] gmichalec@gmichalec:~/work/savagecloud-monitoring$ ./nomad plan ./nomad.hcl.final 
Error during plan: Unexpected response code: 500 (1 error occurred:
	* Vault used in the job but missing Vault token

there is no -vault-token option for nomad plan

Additionally, it seems that if I run a validate on any system other than a nomad leader node, i get this error:

🌩 [iad1:savagecloud-monitoring] gmichalec@iad1-hashi-1:~$ nomad job validate ./nomad.hcl.final
Job validation errors:
1 error occurred:
	* failed to lookup Vault token: Vault client not active

@shoenig
Copy link
Contributor

shoenig commented Aug 1, 2022

Thanks for the followup @michalgm, I've opened #13940 and #13939 to get these fixed

hc-github-team-nomad-core pushed a commit that referenced this issue Aug 12, 2022
This PR fixes a regression where the 'job plan' command would not respect
a Vault token if set via --vault-token or $VAULT_TOKEN.

Basically the same bug/fix as for the validate command in #13062

Fixes #13939
@lgfa29 lgfa29 modified the milestones: 1.3.x, 1.3.2 Aug 24, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug
Projects
Development

Successfully merging a pull request may close this issue.

4 participants