Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul tokens not cleaned up if clients restart #20184

Open
lgfa29 opened this issue Mar 21, 2024 · 1 comment
Open

Consul tokens not cleaned up if clients restart #20184

lgfa29 opened this issue Mar 21, 2024 · 1 comment
Labels

Comments

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 21, 2024

Nomad version

Nomad v1.7.6
BuildDate 2024-03-12T07:27:36Z
Revision 594fedbfbc4f0e532b65e8a69b28ff9403eb822e

Issue

When using workload identities with Consul, the Consul ACL tokens for services are derived in an alloc runner Prerun hook.

// tokens are a map of Consul cluster to identity name to Consul ACL token.
tokens := map[string]map[string]*consulapi.ACLToken{}
tg := job.LookupTaskGroup(h.alloc.TaskGroup)
if tg == nil { // this is always a programming error
return fmt.Errorf("alloc %v does not have a valid task group", h.alloc.Name)
}
var mErr *multierror.Error
if err := h.prepareConsulTokensForServices(tg.Services, tg, tokens); err != nil {
mErr = multierror.Append(mErr, err)
}
for _, task := range tg.Tasks {
if err := h.prepareConsulTokensForServices(task.Services, tg, tokens); err != nil {
mErr = multierror.Append(mErr, err)
}
if err := h.prepareConsulTokensForTask(task, tg, tokens); err != nil {
mErr = multierror.Append(mErr, err)
}
}
if err := mErr.ErrorOrNil(); err != nil {
revokeErr := h.revokeTokens(tokens)
mErr = multierror.Append(mErr, revokeErr)
return mErr.ErrorOrNil()
}
// write the tokens to hookResources
h.hookResources.SetConsulTokens(tokens)

But SetConsulTokens() only store then in memory.

// SetConsulTokens merges a given map of Consul cluster names to task
// identities to Consul tokens with previously written data. This method is
// called by the allocrunner consul hook.
func (a *AllocHookResources) SetConsulTokens(m map[string]map[string]*consulapi.ACLToken) {
a.mu.Lock()
defer a.mu.Unlock()
for k, v := range m {
a.consulTokens[k] = v
}
}

Since they are not persisted in any kind of durable storage, if the client restarts a new token is generated, leaving the old one behind and never cleaning it up.

Reproduction steps

  1. Start a Consul agent with ACL enabled.

    # consul.hcl
    
    acl = {
      enabled                  = true
      default_policy           = "deny"
      enable_token_persistence = true
    }
    consul agent -dev -config ./consul.hcl
    
  2. Bootstrap Consul ACL system.

    consul acl bootstrap
    
  3. Start a Nomad server with the following configuration.

    # server.hcl
    
    name      = "server1"
    data_dir  = "/tmp/nomad/server1"
    log_level = "DEBUG"
    
    ports {
      http = 4646
      rpc  = 4647
      serf = 4648
    }
    
    server {
      enabled          = 1
      bootstrap_expect = 1
    }
    
    consul {
      enabled = true
    
      service_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    
      task_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    }
    CONSUL_HTTP_TOKEN=... nomad agent -config ./server.hcl
    
  4. Start a Nomad agent with the following configuration.

    # client.hcl
    
    name     = "client-1"
    data_dir = "/tmp/nomad/client1"
    log_level = "DEBUG"
    
    ports {
      http = 5656
      rpc  = 5657
      serf = 5658
    }
    
    server {
      enabled = false
    }
    
    client {
      enabled = true
    
      server_join {
        retry_join = ["127.0.0.1"]
      }
    }
    
    consul {
      enabled = true
      
      service_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
      
      task_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    }
    CONSUL_HTTP_TOKEN=... nomad agent -config ./client.hcl
    
  5. Configure Consul JWT auth method for Nomad.

    CONSUL_HTTP_TOKEN=... nomad setup consul -y
    
  6. Register job with Consul service.

    # example.nomad.hcl
    
    job "example" {
      group "cache" {
        network {
          port "db" {
            to = 6379
          }
        }
      
        service {
          name = "redis"
          port = "db"
        }
      
        task "redis" {
          driver = "docker"
      
          config {
            image = "redis:7"
            ports = ["db"]
          }
        }
      }
    }
    nomad run example.nomad.hcl
    
  7. Verify an ACL token for the service was created.

    $ CONSUL_HTTP_TOKEN=... consul acl token list
    AccessorID:       f7650ef5-4bc0-149e-2cef-c67adda4236c
    SecretID:         35269d2b-9c1d-f735-21a5-0ee64fd56b9e
    Description:      Bootstrap Token (Global Management)
    Local:            false
    Create Time:      2024-03-21 15:33:52.606667 -0400 EDT
    Policies:
        00000000-0000-0000-0000-000000000001 - global-management
    
    AccessorID:       00000000-0000-0000-0000-000000000002
    SecretID:         anonymous
    Description:      Anonymous Token
    Local:            false
    Create Time:      2024-03-21 15:33:50.881215 -0400 EDT
    
    AccessorID:       9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede
    SecretID:         ea0a4767-fdc0-43b8-b65a-d5ad05721e93
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:35:11.868371 -0400 EDT
    Service Identities:
        redis (Datacenters: all)
  8. Stop Nomad client and start it again.

  9. Verify a new Consul ACL token was created.

    $ CONSUL_HTTP_TOKEN=... consul acl token list
    AccessorID:       f7650ef5-4bc0-149e-2cef-c67adda4236c
    SecretID:         35269d2b-9c1d-f735-21a5-0ee64fd56b9e
    Description:      Bootstrap Token (Global Management)
    Local:            false
    Create Time:      2024-03-21 15:33:52.606667 -0400 EDT
    Policies:
      00000000-0000-0000-0000-000000000001 - global-management
    
    AccessorID:       4d716a07-2d92-c808-82af-eba7e96b3068
    SecretID:         424a9ab6-9df9-8782-9c56-8ce13dfe0867
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:37:50.426149 -0400 EDT
    Service Identities:
      redis (Datacenters: all)
    
    AccessorID:       00000000-0000-0000-0000-000000000002
    SecretID:         anonymous
    Description:      Anonymous Token
    Local:            false
    Create Time:      2024-03-21 15:33:50.881215 -0400 EDT
    
    AccessorID:       9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede
    SecretID:         ea0a4767-fdc0-43b8-b65a-d5ad05721e93
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:35:11.868371 -0400 EDT
    Service Identities:
      redis (Datacenters: all)
  10. Stop example job.

    nomad job stop example
    
  11. Verify first ACL token is left behind.

    $ CONSUL_HTTP_TOKEN=... consul acl token list
    AccessorID:       f7650ef5-4bc0-149e-2cef-c67adda4236c
    SecretID:         35269d2b-9c1d-f735-21a5-0ee64fd56b9e
    Description:      Bootstrap Token (Global Management)
    Local:            false
    Create Time:      2024-03-21 15:33:52.606667 -0400 EDT
    Policies:
      00000000-0000-0000-0000-000000000001 - global-management
    
    AccessorID:       00000000-0000-0000-0000-000000000002
    SecretID:         anonymous
    Description:      Anonymous Token
    Local:            false
    Create Time:      2024-03-21 15:33:50.881215 -0400 EDT
    
    AccessorID:       9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede
    SecretID:         ea0a4767-fdc0-43b8-b65a-d5ad05721e93
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:35:11.868371 -0400 EDT
    Service Identities:
      redis (Datacenters: all)

Expected Result

The first ACL token created is recovered when the client restarts.

Actual Result

A new ACL token is created, leaving the old one behind.

@Mac2
Copy link

Mac2 commented Jan 7, 2025

Hi,
we've noticed the exact same issue today on our Nomad v1.9.3 and Consul v1.9.2 when we try to migration from Legacy Consul Setup to the Workload Identites...

Nomad v1.9.3
BuildDate 2024-11-11T16:35:41Z
Revision d92bf10

Consul v1.19.2
Revision 048f1936
Build Date 2024-08-27T16:06:44Z

Are there any updates on this issue ?

thanks,
Mac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants