erroneous "cpu exhausted" message using qemu #302

ghost · 2015-10-19T01:48:21Z

Using current master, any qemu or docker instance I attempt to schedule is coming up unable to find a node for placement. This may or may not be because I'm attempting to run something paravirtualized (or I could just not have the nomad file built properly), if so it would be helpful if the allocator told me this. Tested with a 4 core node with 4 gigs of memory. I've run docker images on this setup with no issue before however they are also now failing. Command output:

AC02MK0LSFD58:~ rvm2015$ nomad run example.nomad
==> Monitoring evaluation "0ec1d56f-b31d-c1c7-aeb5-2ab272516b32"
    Evaluation triggered by job "qemu_centos7"
    Scheduling error for group "qemu_test" (failed to find a node for placement)
    Allocation "b4a83386-3d8c-a145-b079-8690a23b8612" status "failed" (0/1 nodes filtered)
      * Resources exhausted on 1 nodes
      * Dimension "cpu exhausted" exhausted on 1 nodes
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "0ec1d56f-b31d-c1c7-aeb5-2ab272516b32" finished with status "complete"

example.nomad

# There can only be a single job definition per file.
# Create a job with ID and Name 'example'
job "qemu_centos7" {
# Run the job in the global region, which is the default.
# region = "global"

# Specify the datacenters within the region this job can run in.
datacenters = ["dc1"]

# Service type jobs optimize for long-lived services. This is
# the default but we can change to batch for short-lived tasks.
# type = "service"

# Priority controls our access to resources and scheduling priority.
# This can be 1 to 100, inclusively, and defaults to 50.
# priority = 50

# Restrict our job to only linux. We can specify multiple
# constraints as needed.
constraint {
    attribute = "$attr.kernel.name"
    value = "linux"
}

# Configure the job to do rolling updates
update {
    # Stagger updates every 10 seconds
    stagger = "10s"

    # Update a single task at a time
    max_parallel = 1
}

# Create a 'cache' group. Each task in the group will be
# scheduled onto the same machine.
group "qemu_test" {
    # Control the number of instances of this groups.
    # Defaults to 1
    # count = 1

    # Define a task to run
    task "qemu_task" {
        # Use Docker to run the task.
        driver = "qemu"

        image_source = "http://core.example.org/centos7.qcow2"
        checksum = "443ca3ac203fa0f90bbd739119b57384" 

        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 500 # 500 Mhz
            memory = 256 # 256MB
            network {
                mbits = 10
            }
        }
    }
}
}

The text was updated successfully, but these errors were encountered:

ghost · 2015-10-19T01:58:03Z

Rolling this to nomad 0.1.2 fixes docker, partially fixes qemu. Qemu is now stating "failed to start: Missing source image Qemu driver" however this appears to be a check on the source address of the qemu image in qemu.go (https://github.com/hashicorp/nomad/blob/7ab84c2862d8f8de75e9ac64ee71b8a0cd05c798/client/driver/qemu.go) which is correct for my environment.

AC02MK0LSFD58:~ rvm2015$ nomad alloc-status 9439010a-b52c-ad2a-c35d-ecf9560d10b0
ID                = 9439010a-b52c-ad2a-c35d-ecf9560d10b0
EvalID            = 580d761f-8ba8-1b73-e265-b7ea6987e539
Name              = qemu_centos7.qemu_test[0]
NodeID            = baaf41ac-074f-ff4e-eb0a-eed11d8bc246
JobID             = qemu_centos7
ClientStatus      = failed
ClientDescription = {"qemu_task":{"Status":"failed","Description":"failed to start: Missing source image Qemu driver"}}
NodesEvaluated    = 1
NodesFiltered     = 0
NodesExhausted    = 0
AllocationTime    = 27.002µs
CoalescedFailures = 0

==> Status
Allocation "9439010a-b52c-ad2a-c35d-ecf9560d10b0" status "failed" (0/1 nodes filtered)
   * Score "baaf41ac-074f-ff4e-eb0a-eed11d8bc246.binpack" = 4.229620

ghost · 2015-10-19T02:18:03Z

I suspect the original issue is possibly something in a dependent library changed since 0.1.2 was released (most likely shirou/gopsutil). If I check out v0.1.2 and rebuild locally it I get the same error about exhausted cpus.

ghost · 2015-10-19T02:35:05Z

Yeah it looks like the current master tag of gopsutil is broken, get a nil object when testing the library, opened a bug report with the maintainer. Would not be a terrible idea to implement godep in the future to prevent these kinds of issues.

achanda · 2015-10-19T04:55:31Z

The fix to gopsutil has been merged. You should be able to manually pull those changes and rebuild nomad. But yes, a way to reproduce build deterministically is absolutely essential.

cbednarski · 2015-10-19T17:42:02Z

@rvm2015 Thanks for the detailed report. I think you're correct we will need godep to prevent this type of issue. We're discussing this internally.

dadgar · 2016-01-06T23:43:37Z

Closing as this was from an upstream bug.

github-actions · 2022-12-27T02:13:29Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost mentioned this issue Oct 19, 2015

nomad node-status <node id> should show capabilities and cpu/memory max/available #303

Closed

ghost mentioned this issue Oct 19, 2015

cpu.CPUInfo() returns empty []cpu.CPUInfoStat object on linux shirou/gopsutil#101

Closed

cbednarski added type/bug theme/core labels Oct 19, 2015

dadgar closed this as completed Jan 6, 2016

github-actions bot locked as resolved and limited conversation to collaborators Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erroneous "cpu exhausted" message using qemu #302

erroneous "cpu exhausted" message using qemu #302

ghost commented Oct 19, 2015

ghost commented Oct 19, 2015

ghost commented Oct 19, 2015

ghost commented Oct 19, 2015

achanda commented Oct 19, 2015

cbednarski commented Oct 19, 2015

dadgar commented Jan 6, 2016

github-actions bot commented Dec 27, 2022

erroneous "cpu exhausted" message using qemu #302

erroneous "cpu exhausted" message using qemu #302

Comments

ghost commented Oct 19, 2015

ghost commented Oct 19, 2015

ghost commented Oct 19, 2015

ghost commented Oct 19, 2015

achanda commented Oct 19, 2015

cbednarski commented Oct 19, 2015

dadgar commented Jan 6, 2016

github-actions bot commented Dec 27, 2022