Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fingerprinting OVH nodes with incorrect CPU frequency after upgrading to 1.7.1 #19406

Closed
kevinschoonover opened this issue Dec 9, 2023 · 7 comments

Comments

@kevinschoonover
Copy link
Contributor

Nomad version

> nomad version
Nomad v1.7.1
BuildDate 2023-12-08T18:11:21Z
Revision 608e719430038cdeb5fe108536d90cf88a8540e3

Operating system and Environment details

ovh VPS with the following configuration:

> uname -a
Linux vps-9e8a4a7f 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux

Issue

After upgrading to 1.7.1, the OVH nodes in my nomad cluster report 0 MHZ fingerprinted CPU; however, if you look at the logs below you see that it detects 8 CPUs just not the clock speed for them.
missing_cpu

I have another node in hetzner that it is able to properly detect the CPU frequency for. Downgrading to nomad 1.6.4 and restarting resolves then problem.

Reproduction steps

Start nomad client on a OVH node and have it join the cluster

Nomad Client logs (if appropriate)

Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: ==> Nomad agent configuration:
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:        Advertise Addrs: HTTP: 100.101.109.40:4646
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:             Bind Addrs: HTTP: [100.101.109.40:4646 127.0.0.1:4646]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                 Client: true
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:              Log Level: DEBUG
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                 Region: global (DC: ovh-us-west-or-fed)
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                 Server: false
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                Version: 1.7.1
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: ==> Nomad agent started! Log data will stream in below:
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.465Z [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/var/nomad/plugins
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.476Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/var/nomad/plugins
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.478Z [INFO]  client: using state directory: state_dir=/var/nomad/client
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.479Z [INFO]  client: using alloc directory: alloc_dir=/var/nomad/alloc
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.479Z [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.547Z [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=["arch", "bridge", "cgroup", "cni", "consul", "cpu", "host", "landlock", "memory", "network", "nomad", "plugins_cni", "signal", "storage", "vault", "env_digitalocean", "env_aws", "env_gce", "env_azure"]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.547Z [DEBUG] client.fingerprint_mgr.cgroup: detected cgroups: version=2
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.547Z [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=/opt/cni/config
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.559Z [INFO]  client.fingerprint_mgr.consul: consul agent is available: cluster=default
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.559Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul initial_period=52.968832938s
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU model: name="Intel Core Processor (Haswell, no TSX)"
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU efficiency core count: cores=8
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU performance core count: cores=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU core count: cores=8
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.613Z [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/tailscale0/speed device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: interface=tailscale0 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=tailscale0 IP=100.101.109.40
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=tailscale0 IP=fd7a:115c:a1e0:ab12:4843:cd96:6265:6d28
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.619Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.619Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/lo/speed device=lo
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.619Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=lo mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.621Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens3
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.622Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/ens3/speed device=ens3
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.622Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ens3 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.625Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.625Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/tailscale0/speed device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.625Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=tailscale0 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.628Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.629Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/docker0/speed device=docker0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.629Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=docker0 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.712Z [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.712Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault initial_period=51.127719075s
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.113Z [DEBUG] client.fingerprint_mgr.env_aws: read an empty value: attribute=public-ipv4
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.129Z [DEBUG] client.fingerprint_mgr.env_aws: read an empty value: attribute=mac
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.238Z [DEBUG] client.fingerprint_mgr.env_aws: read an empty value: attribute=instance-life-cycle
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.302Z [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type resp_code=404
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.355Z [DEBUG] client.fingerprint_mgr.env_azure: could not read value for attribute: attribute=compute/azEnvironment resp_code=404
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.370Z [DEBUG] client.fingerprint_mgr.env_digitalocean: could not read value for attribute: attribute=region resp_code=404
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.370Z [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs=["arch", "bridge", "consul", "cpu", "host", "network", "nomad", "plugins_cni", "signal", "storage", "vault", "env_aws"]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.370Z [INFO]  client.proclib.cg2: initializing nomad cgroups: cores=0-7
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.proclib.cg2: top level partition root nomad.slice cgroup initialized
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.proclib.cg2: partition member nomad.slice/share cgroup initialized
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.proclib.cg2: partition member nomad.slice/reserve cgroup initialized
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [INFO]  client.plugin: starting plugin manager: plugin-type=device
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.device_mgr: exiting since there are no device plugins
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=undetected description=""
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=undetected description=disabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=healthy description=Healthy
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr.docker: using client connection initialized from environment: driver=docker
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=""
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.374Z [DEBUG] client.consul: bootstrap contacting Consul DCs: consul_dcs=["ovh-us-west-or-fed", "hetzner-hil"]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.379Z [INFO]  client.consul: discovered following servers: servers=[100.90.116.76:4647, 100.83.171.64:4647, 100.91.63.45:4647]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.379Z [DEBUG] client.server_mgr: new server list: new_servers=[100.83.171.64:4647, 100.90.116.76:4647, 100.91.63.45:4647] old_servers=[]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.396Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=healthy description=Healthy
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.396Z [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[exec docker] undetected:[java raw_exec qemu]]"
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.396Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.397Z [INFO]  client: started client: node_id=2e503af6-c37a-510a-a6be-8fa6e96d88b5
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.399Z [DEBUG] client: updated allocations: index=246 total=0 pulled=0 filtered=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.399Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.399Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.400Z [DEBUG] http: UI is enabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.401Z [DEBUG] http: UI is enabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.401Z [DEBUG] http: UI is enabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:  client: node registration complete
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.407Z [INFO]  client: node registration complete
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.407Z [DEBUG] client: evaluations triggered by node registration: num_evals=1
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: client: evaluations triggered by node registration: num_evals=1
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.415Z [DEBUG] consul.sync: sync complete: registered_services=1 deregistered_services=0 registered_checks=1 deregistered_checks=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: consul.sync: sync complete: registered_services=1 deregistered_services=0 registered_checks=1 deregistered_checks=0
Dec 09 08:26:21 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:21.536Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="599.388µs"
Dec 09 08:26:21 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="599.388µs"
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:23.411Z [DEBUG] client: state changed, updating node and re-registering
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]: client: state changed, updating node and re-registering
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:23.417Z [INFO]  client: node registration complete
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]:  client: node registration complete
Dec 09 08:26:31 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:31.539Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="638.198µs"
Dec 09 08:26:31 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="638.198µs"
Dec 09 08:26:33 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:33.376Z [DEBUG] http: request complete: method=GET path=/v1/metrics?format=prometheus duration=3.908567ms
Dec 09 08:26:33 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/metrics?format=prometheus duration=3.908567ms
Dec 09 08:26:41 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:41.542Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="663.875µs"
Dec 09 08:26:41 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="663.875µs"
Dec 09 08:26:51 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:51.547Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=2.317271ms
Dec 09 08:26:51 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration=2.317271ms
Dec 09 08:27:01 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:27:01.551Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="998.587µs"
Dec 09 08:27:01 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="998.587µs"
@quoing
Copy link
Contributor

quoing commented Dec 9, 2023

Same issue on KVM VM + Docker, Nomad 1.7.1

as workaround you can override it.. my 2*3300MHz (eg from cat /proc/cpuinfo) = 6600Mhz

client {
cpu_total_compute=6600
..
}

restart nomad

@tgross
Copy link
Member

tgross commented Dec 11, 2023

Potentially related: #19412

@lindleydev
Copy link

lindleydev commented Dec 11, 2023

Somewhat related - after upgrading my raspberry pi cluster to use Nomad 1.7.1 I was seeing errors from the CPU fingerprinter.

Dec 10 05:25:23 rasp-pi-2 nomad[1916183]:     2023-12-10T05:25:23.004Z [ERROR] client.alloc_runner: postrun failed: alloc_id=965cd7d7-f029-36d2-1a83-9e1e3db848f9 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"
Dec 10 05:25:23 rasp-pi-2 nomad[1916183]:     2023-12-10T05:25:23.006Z [ERROR] client.alloc_runner: postrun failed: alloc_id=26a42c2f-d788-11e5-9ecb-d8aead7ca081 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"

I was able to fix this by creating the directory Nomad is looking for and it resolved the issue.

It also happened in the pre_run hook as well:

Dec 10 05:26:25 rasp-pi-2 nomad[1916183]:     2023-12-10T05:26:25.228Z [ERROR] client.alloc_runner: prerun failed: alloc_id=e3c914a4-855e-0887-8085-c659ed9cd122 error="pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"

@shoenig
Copy link
Contributor

shoenig commented Dec 11, 2023

@lindleydev that's actually a separate problem - I suspect in your case cgroups is mounted but the cpuset controller is not enabled. In previous versions of Nomad we allowed such a configuration at the expense of not actually enforcing resource utilization, but in 1.7 it's mandatory. There's some discussion about this happening in #19176

@Settler
Copy link

Settler commented Dec 11, 2023

Possible reason of this issue described here: #19412 (comment)

@tgross
Copy link
Member

tgross commented Dec 13, 2023

Hey folks, just an update that the team is actively working on this issue. This issue and #19412 are effectively duplicates, so I'm going to close this issue as a dupe because there's been a bit more discussion over there.

@tgross tgross closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2023
Copy link

github-actions bot commented Jan 2, 2025

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

6 participants