-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IBM CSM support #804
Comments
I did notice a while ago that plm/lsf self-disqualifies. Does this ticket include implementing plm/lsf so that it may be used as the launcher instead of plm/ssh (just like plm/alps is used as the launcher on ALPS systems)? At risk of conflating issues, I was unsuccessful launching the DVM on large allocations >128(ish) nodes with ras/lsf + plm/ssh. There was no output from prte in that particular instance, so I could not investigate then. |
CSM does not have a generic API for launching daemons, but it does have one specific to JSM csm_jsrun_cmd. If CSM provides such an interface then we could write a |
IIUC, plm/slurm uses srun and plm/alps uses aprun. Could plm/[csm|lsf] use jsrun? |
Possibly. I've not tried launching |
On some IBM systems (most notably CORAL systems like Summit, Sierra, Lassen) LSF plays the role of a scheduler and Cluster Management System (CSM) plays the role of the resource manager.
Often in this configuration, LSF does not have a daemon present on the compute nodes, but CSM does. LSF will identify the nodes for the allocation, but CSM may categorize those nodes differently based on various roles. Most notably CSM will distinguish a 'login' and 'compute' nodes.
The CSM API has a csm_allocation_query that can be used to query for this information. This can be used as the basis for a
ras/csm
component.We can detect if we are in a CSM environment by the presence of the
CSM_ALLOCATION_ID
envar as we do inplm/lsf
to disqualify itself here.The headers for CSM can be found in the repository below
The text was updated successfully, but these errors were encountered: