Skip to content

(3.0-3.6) Compute Nodes Belonging To More Than One Partition Causes Compute To Overscale #5404

@davprat

Description

@davprat

Bug description

If a cluster is created with a custom Slurm configuration that places static compute nodes into more than one partition, then ParallelCluster will attempt to launch as many EC2 instances for a given node as the number of partitions that node belongs to. This will result in over scaling and node termination due to multiple instances backing a single node.

Affected versions (OSes, schedulers)

  • ParallelCluster versions >= 3.0.0 and <= 3.6.0 on all OSs.
  • Only the Slurm scheduler is affected.

Mitigation

You can find a detailed explanation and the mitigation of the problem in (3.0.0-3.6.0) Compute Nodes Belonging To More Than One Partition Causes Compute To Overscale

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions