Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slurmgcp-managed infix to resource policy name #2892

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

mr0re1
Copy link
Collaborator

@mr0re1 mr0re1 commented Aug 8, 2024

Motivation: improve collision avoidance, current matching pattern "${cluster_name}-*" is too relaxed.

Consideration for max length of resource policy name, that

Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)'

  • 'slurm_cluster_name' must be a match of regex '^[a-z](?:[a-z0-9]{0,9})$' = 10
  • nodeset_name = substr(replace(var.name, "/[^a-z0-9]/", ""), 0, 14) = 14
  • assume len(job_id) = 10
  • assume len(nodes_chunk_id) = 3
  • len("slurmgcp-managed") = 16

len("${cluster_name}-slurmgcp-managed-${nodeset_name}-${job_id}-${nodes_chunk_id}") = 10+1+16+1+14+1+10+3 = 56

Testing:

$ gcloud compute resource-policies list | grep tt0
tt0-slurmgcp-managed-debugnodeset-1-0

$ ./ghpc destroy tt0 $AA
...
module.slurm_controller.null_resource.cleanup_compute[0] (local-exec): Deleted [https://www.googleapis.com/compute/beta/projects/io-playground/zones/us-central1-a/instances/tt0-debugnodeset-0].
module.slurm_controller.null_resource.cleanup_compute[0] (local-exec): Deleted [https://www.googleapis.com/compute/beta/projects/io-playground/zones/us-central1-a/instances/tt0-debugnodeset-1].
module.slurm_controller.null_resource.cleanup_compute[0]: Still destroying... [id=8428718126253180537, 2m20s elapsed]
module.slurm_controller.null_resource.cleanup_compute[0] (local-exec): Deleting resource policies
module.slurm_controller.null_resource.cleanup_compute[0] (local-exec): Deleted [https://www.googleapis.com/compute/beta/projects/io-playground/regions/us-central1/resourcePolicies/tt0-slurmgcp-managed-debugnodeset-1-0].
module.slurm_controller.null_resource.cleanup_compute[0]: Destruction complete after 2m25s
...
$ gcloud compute resource-policies list | grep tt0
$

Followups:

  • Include nodeset_name into match pattern of cleanup_compute.sh (once cleanup-per-nodeset is implemented).

Alternative considerations:

  • Use existing resource.random_uuid.cluster_id - too long (36 chars); can create unexpected lifecycle dependencies;
  • Use dedicated resource.random_uuid.short_cluster_id - can create unexpected lifecycle dependencies;

@mr0re1 mr0re1 added the release-bugfix Added to release notes under the "Bug fixes" heading. label Aug 8, 2024
@tpdownes tpdownes self-requested a review August 9, 2024 20:56
Copy link
Member

@tpdownes tpdownes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Notes:

@tpdownes tpdownes assigned mr0re1 and unassigned tpdownes and nick-stroud Aug 9, 2024
@mr0re1 mr0re1 merged commit c1ab8a4 into GoogleCloudPlatform:develop Aug 9, 2024
27 of 57 checks passed
@mr0re1 mr0re1 deleted the large_words branch August 9, 2024 21:58
@rohitramu rohitramu mentioned this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-bugfix Added to release notes under the "Bug fixes" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants