Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy topology.conf unconditionally #53

Merged
merged 2 commits into from
Feb 15, 2024
Merged

Conversation

optiz0r
Copy link
Contributor

@optiz0r optiz0r commented Jan 25, 2024

srun will fail if slurm.conf defines use of a topology plugin but topology.conf does not exist on login nodes. Previously, topology.conf was only deployed to slurmd and slurmctld nodes.

$ srun -c 1 -n 1 --mem 1g --pty bash
srun: error: s_p_parse_file: cannot stat file /etc/slurm/topology.conf: No such file or directory, retrying in 1sec up to 60sec
srun: fatal: something wrong with opening/reading /etc/slurm/topology.conf: No such file or directory

This change removes the conditional and deploys topology.conf to all nodes.

Fixes #52

@optiz0r
Copy link
Contributor Author

optiz0r commented Jan 25, 2024

As mentioned in #52, this is the change I made locally to get srun working. I'm happy to revise the approach and add a new path through the previous conditional statement, if preferred.

@treydock
Copy link
Owner

Left comment on #52 before I saw PR. This approach is fine, as it's essentially the same end result.

@treydock treydock added the bugfix Something isn't working label Jan 25, 2024
@optiz0r optiz0r force-pushed the 52_topology branch 2 times, most recently from 2f3c27f to d4268d7 Compare January 25, 2024 15:12
@treydock
Copy link
Owner

Some unit tests are failing due to lint issues with trailing commas in the unit tests.

@optiz0r
Copy link
Contributor Author

optiz0r commented Jan 26, 2024

I'll take a closer look, I was fighting vscode trying to make excessive changes with formatting. Bear with me.

srun will fail if slurm.conf defines use of a topology plugin
but topology.conf does not exist on login nodes. Previously,
topology.conf was only deployed to slurmd and slurmctld nodes.

```
$ srun -c 1 -n 1 --mem 1g --pty bash
srun: error: s_p_parse_file: cannot stat file /etc/slurm/topology.conf: No such file or directory, retrying in 1sec up to 60sec
srun: fatal: something wrong with opening/reading /etc/slurm/topology.conf: No such file or directory
```

This change removes the conditional and deploys topology.conf to all
nodes.

Fixes treydock#52
@treydock treydock merged commit b58615c into treydock:master Feb 15, 2024
14 checks passed
@optiz0r optiz0r deleted the 52_topology branch February 16, 2024 12:04
@treydock
Copy link
Owner

This is released with v3.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

topology.conf is required on login nodes for srun to work
2 participants