Skip to content

(3.8.0 - 3.13.1) Update-Cluster, Update-Compute-Fleet may fail when Compute Resources use an expired Capacity Reservation #6870

Open
@hehe7318

Description

@hehe7318

The issue

The following operations may fail:

  • pcluster update-cluster
  • pcluster update-compute-fleet

With the error message

Unable to parse configuration file. An error occurred when calling the DescribeCapacityReservations operation: The capacity reservation ID 'cr-xxxxx' was not found

When

  1. Current cluster configuration includes a ComputeResources entry with a CapacityReservationId
  2. The specified Capacity Reservation has expired
  3. No InstanceType is specified within the same ComputeResources entry

For example

Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: string
      ComputeResources:
        - Name: string
          MinCount: integer
          MaxCount: integer
          # InstanceType is missing
          CapacityReservationTarget:
            # The Capacity Reservation below is expired or cancelled
            CapacityReservationId: cr-01234567890abcdef

Affected ParallelCluster versions, OSes and schedulers

All ParallelCluster versions from 3.8.0 to 3.13.1 with the Slurm scheduler on all OSes.

Mitigation

You can find a detailed explanation and the mitigation of the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions