Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a slurm workflow manager #789

Merged
merged 8 commits into from
Jan 10, 2025

Conversation

linsword13
Copy link
Collaborator

@linsword13 linsword13 commented Dec 5, 2024

Example usage:

ramble:
  variants:
    workflow_manager: slurm
  variables:
    processes_per_node: 1
  applications:
    hostname:
      workloads:
        local:
          experiments:
            test:
              variables:
                n_nodes: 1
  # The batch_submit is defined by the workflow object
ramble on

  # Executors for query and cancel are available
ramble on --executor "{batch_query}"
ramble on --executor "{batch_cancel}"

  # The status report is also available in analyze
ramble workspace analyze -p

@linsword13 linsword13 force-pushed the workflow branch 11 times, most recently from 493648f to 0a705d9 Compare December 13, 2024 04:30
@linsword13
Copy link
Collaborator Author

Marking this as a draft, given that this PR will need to be updated if #804 is in.

@douglasjacobsen
Copy link
Collaborator

Something I just found out, if a job is completed and you run ramble on --executor='{query_job}' then it'll say the job failed (even if you haven't analyzed it.

We might want to see if there's a way to use something like scontrol to get the status of completed jobs.

@linsword13
Copy link
Collaborator Author

Something I just found out, if a job is completed and you run ramble on --executor='{query_job}' then it'll say the job failed (even if you haven't analyzed it.

We might want to see if there's a way to use something like scontrol to get the status of completed jobs.

Hm that's unexpected. I did have a sacct command to try to query for status of completed jobs, if squeue comes up empty. I will look into it.

@douglasjacobsen
Copy link
Collaborator

Let me try it again, it might be that analyzing the job failed for some reason.

@douglasjacobsen
Copy link
Collaborator

Yeah, I think I just had a weird issue with my experiments.

Can we also add default -o and -e pragmas to the generated submission script? This can place the slurm logs into the experiment directory.

@linsword13 linsword13 marked this pull request as ready for review January 9, 2025 07:12
@linsword13
Copy link
Collaborator Author

Yeah, I think I just had a weird issue with my experiments.

Can we also add default -o and -e pragmas to the generated submission script? This can place the slurm logs into the experiment directory.

Thanks for trying out! Also added the -o and -e.

@linsword13
Copy link
Collaborator Author

Marking this as a draft, given that this PR will need to be updated if #804 is in.

This is now updated. It currently uses changes in #814.

Copy link
Collaborator

@douglasjacobsen douglasjacobsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. One small request. :)

lib/ramble/ramble/application.py Outdated Show resolved Hide resolved
@linsword13 linsword13 dismissed douglasjacobsen’s stale review January 10, 2025 15:13

The merge-base changed after approval.

@douglasjacobsen douglasjacobsen self-assigned this Jan 10, 2025
@douglasjacobsen douglasjacobsen added the enhancement New feature or request label Jan 10, 2025
@douglasjacobsen douglasjacobsen merged commit bccd9cb into GoogleCloudPlatform:develop Jan 10, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants