Skip to content

Commit

Permalink
Merge pull request #2 from mathworks/discovery
Browse files Browse the repository at this point in the history
Add cluster discovery files
  • Loading branch information
jmcave authored Apr 14, 2023
2 parents 3bf85ee + 10e6fec commit 5ce5f91
Show file tree
Hide file tree
Showing 10 changed files with 298 additions and 86 deletions.
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Parallel Computing Toolbox plugin for MATLAB Parallel Server with Slurm

[![View on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://www.mathworks.com/matlabcentral/fileexchange/52807)
[![View Parallel Computing Toolbox Plugin for Slurm on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://mathworks.com/matlabcentral/fileexchange/127364-parallel-computing-toolbox-plugin-for-slurm)

Parallel Computing Toolbox™ provides the `Generic` cluster type for submitting MATLAB® jobs to a cluster running a third-party scheduler.
The `Generic` cluster type uses a set of plugin scripts to define how your machine communicates with your scheduler.
Expand Down Expand Up @@ -34,6 +34,16 @@ These scripts use the `sacct` command to track the state of jobs on the cluster.
For the `sacct` command to run, job accounting must be enabled on the Slurm cluster.
To use `squeue` instead, uncomment the relevant lines in [`getJobStateFcn`](getJobStateFcn.m).

### Cluster Discovery

Since version R2023a, MATLAB can discover clusters running third-party schedulers such as Slurm.
As a cluster admin, you can create a configuration file that describes how to configure the Parallel Computing Toolbox on the user's machine to submit MATLAB jobs to the cluster.
The cluster configuration file is a plain text file with the extension `.conf` containing key-value pairs that describe the cluster configuration information.
The MATLAB client will use the cluster configuration file to create a cluster profile for the user who discovers the cluster.
Therefore, users will not need to follow the instructions in the sections below.
You can find an example of a cluster configuration file in [discover/example.conf](discover/example.conf).
For full details on how to make a cluster running a third-party scheduler discoverable, see the documentation for [Configure for Third-Party Scheduler Cluster Discovery](https://mathworks.com/help/matlab-parallel-server/configure-for-cluster-discovery.html).

### Create a Cluster Profile in MATLAB

Create a cluster profile by using either the Cluster Profile Manager or the MATLAB Command Window.
Expand Down Expand Up @@ -225,4 +235,4 @@ The license is available in the [license.txt](license.txt) file in this reposito

If you require assistance or have a request for additional features or capabilities, please contact [MathWorks Technical Support](https://www.mathworks.com/support/contact_us.html).

Copyright 2022 The MathWorks, Inc.
Copyright 2022-2023 The MathWorks, Inc.
65 changes: 38 additions & 27 deletions communicatingSubmitFcn.m
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
%
% See also parallel.cluster.generic.communicatingDecodeFcn.

% Copyright 2010-2022 The MathWorks, Inc.
% Copyright 2010-2023 The MathWorks, Inc.

% Store the current filename for the errors, warnings and dctSchedulerMessages
% Store the current filename for the errors, warnings and dctSchedulerMessages.
currFilename = mfilename;
if ~isa(cluster, 'parallel.Cluster')
error('parallelexamples:GenericSLURM:NotClusterObject', ...
Expand All @@ -17,9 +17,22 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)

decodeFunction = 'parallel.cluster.generic.communicatingDecodeFcn';

if ~strcmpi(cluster.OperatingSystem, 'unix')
clusterOS = cluster.OperatingSystem;
if ~strcmpi(clusterOS, 'unix')
error('parallelexamples:GenericSLURM:UnsupportedOS', ...
'The function %s only supports clusters with unix OS.', currFilename)
'The function %s only supports clusters with the unix operating system.', currFilename)
end

% Get the correct quote and file separator for the Cluster OS.
% This check is unnecessary in this file because we explicitly
% checked that the ClusterOsType is unix. This code is an example
% of how to deal with clusters that can be unix or pc.
if strcmpi(clusterOS, 'unix')
quote = '''';
fileSeparator = '/';
else
quote = '"';
fileSeparator = '\';
end

if isprop(cluster.AdditionalProperties, 'ClusterHost')
Expand Down Expand Up @@ -47,18 +60,6 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
end
end

% Get the correct quote and file separator for the Cluster OS.
% This check is unnecessary in this file because we explicitly
% checked that the ClusterOsType is unix. This code is an example
% of how to deal with clusters that can be unix or pc.
if strcmpi(cluster.OperatingSystem, 'unix')
quote = '''';
fileSeparator = '/';
else
quote = '"';
fileSeparator = '\';
end

% The job specific environment variables
% Remove leading and trailing whitespace from the MATLAB arguments
matlabArguments = strtrim(environmentProperties.MatlabArguments);
Expand Down Expand Up @@ -104,6 +105,7 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
else
jobDirectoryOnCluster = remoteConnection.getRemoteJobLocation(job.ID, cluster.OperatingSystem);
end

% Specify the job wrapper script to use.
% Prior to R2019a, only the SMPD process manager is supported.
if verLessThan('matlab', '9.6') || ...
Expand All @@ -116,7 +118,7 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
dirpart = fileparts(mfilename('fullpath'));
localScript = fullfile(dirpart, jobWrapperName);
% Copy the local wrapper script to the job directory
copyfile(localScript, localJobDirectory);
copyfile(localScript, localJobDirectory, 'f');

% The script to execute on the cluster to run the job
wrapperPath = sprintf('%s%s%s', jobDirectoryOnCluster, fileSeparator, jobWrapperName);
Expand All @@ -138,17 +140,26 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
commonSubmitArgs = getCommonSubmitArgs(cluster);
additionalSubmitArgs = strtrim(sprintf('%s %s', additionalSubmitArgs, commonSubmitArgs));

% Create a script to submit a Slurm job - this will be created in the job directory
dctSchedulerMessage(5, '%s: Generating script for job.', currFilename);
localSubmitScriptPath = tempname(localJobDirectory);
createSubmitScript(localSubmitScriptPath, jobName, quotedLogFile, quotedWrapperPath, ...
variables, additionalSubmitArgs);
% Extension to use for scripts
scriptExt = '.sh';

% Path to the submit script as seen by the cluster
[~, submitScriptName] = fileparts(localSubmitScriptPath);
submitScriptPathOnCluster = sprintf('%s%s%s', jobDirectoryOnCluster, fileSeparator, submitScriptName);
% Path to the submit script, to submit the Slurm job using sbatch
localSubmitScriptPath = [tempname(localJobDirectory) scriptExt];
[~, submitScriptName, submitScriptExt] = fileparts(localSubmitScriptPath);
submitScriptPathOnCluster = sprintf('%s%s%s%s', jobDirectoryOnCluster, fileSeparator, submitScriptName, submitScriptExt);
quotedSubmitScriptPathOnCluster = sprintf('%s%s%s', quote, submitScriptPathOnCluster, quote);

% Path to the environment wrapper, which will set the environment variables
% for the job then execute the job wrapper
localEnvScriptPath = [tempname(localJobDirectory) scriptExt];
[~, envScriptName, envScriptExt] = fileparts(localEnvScriptPath);
envScriptPathOnCluster = sprintf('%s%s%s%s', jobDirectoryOnCluster, fileSeparator, envScriptName, envScriptExt);
quotedEnvScriptPathOnCluster = sprintf('%s%s%s', quote, envScriptPathOnCluster, quote);

createEnvironmentWrapper(localEnvScriptPath, quotedWrapperPath, variables);
createSubmitScript(localSubmitScriptPath, jobName, quotedLogFile, ...
quotedEnvScriptPathOnCluster, additionalSubmitArgs);

% Create the command to run on the cluster
commandToRun = sprintf('sh %s', quotedSubmitScriptPathOnCluster);

Expand All @@ -158,13 +169,13 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
remoteConnection.startMirrorForJob(job);
end

if isprop(cluster.AdditionalProperties, 'ClusterHost')
if strcmpi(cluster.OperatingSystem, 'unix')
% Add execute permissions to shell scripts
runSchedulerCommand(cluster, sprintf( ...
'chmod u+x %s%s*.sh', jobDirectoryOnCluster, fileSeparator));
% Convert line endings to Unix
runSchedulerCommand(cluster, sprintf( ...
'dos2unix %s%s*.sh', jobDirectoryOnCluster, fileSeparator));
'dos2unix --allow-chown %s%s*.sh', jobDirectoryOnCluster, fileSeparator));
end

% Now ask the cluster to run the submission command
Expand Down
30 changes: 30 additions & 0 deletions createEnvironmentWrapper.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
function createEnvironmentWrapper(outputFilename, quotedWrapperPath, environmentVariables)
% Create a script that sets the correct environment variables and then
% calls the job wrapper.

% Copyright 2023 The MathWorks, Inc.

dctSchedulerMessage(5, '%s: Creating environment wrapper at %s', mfilename, outputFilename);

% Open file in binary mode to make it cross-platform.
fid = fopen(outputFilename, 'w');
if fid < 0
error('parallelexamples:GenericSLURM:FileError', ...
'Failed to open file %s for writing', outputFilename);
end
fileCloser = onCleanup(@() fclose(fid));

% Specify shell to use
fprintf(fid, '#!/bin/sh\n');

formatSpec = 'export %s=''%s''\n';

% Write the commands to set and export environment variables
for ii = 1:size(environmentVariables, 1)
fprintf(fid, formatSpec, environmentVariables{ii,1}, environmentVariables{ii,2});
end

% Write the command to run the job wrapper
fprintf(fid, '%s\n', quotedWrapperPath);

end
24 changes: 7 additions & 17 deletions createSubmitScript.m
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
function createSubmitScript(outputFilename, jobName, quotedLogFile, quotedWrapperPath, ...
environmentVariables, additionalSubmitArgs, jobArrayString)
% Create a script that sets the correct environment variables and then
% executes the Slurm sbatch command.
function createSubmitScript(outputFilename, jobName, quotedLogFile, ...
quotedWrapperPath, additionalSubmitArgs, jobArrayString)
% Create a script that runs the Slurm sbatch command.

% Copyright 2010-2022 The MathWorks, Inc.
% Copyright 2010-2023 The MathWorks, Inc.

if nargin < 7
if nargin < 6
jobArrayString = [];
end

Expand All @@ -19,20 +18,11 @@ function createSubmitScript(outputFilename, jobName, quotedLogFile, quotedWrappe
end
fileCloser = onCleanup(@() fclose(fid));

% Specify Shell to use
% Specify shell to use
fprintf(fid, '#!/bin/sh\n');

% Write the commands to set and export environment variables
for ii = 1:size(environmentVariables, 1)
fprintf(fid, 'export %s=''%s''\n', environmentVariables{ii,1}, environmentVariables{ii,2});
end

% Generate the command to run and write it.
% We will forward all environment variables with this job in the call
% to sbatch
variablesToForward = environmentVariables(:,1);
commandToRun = getSubmitString(jobName, quotedLogFile, quotedWrapperPath, ...
variablesToForward, additionalSubmitArgs, jobArrayString);
additionalSubmitArgs, jobArrayString);
fprintf(fid, '%s\n', commandToRun);

end
85 changes: 85 additions & 0 deletions discover/example.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Since version R2023a, MATLAB can discover clusters running third-party
# schedulers such as Slurm. The Discover Clusters functionality
# automatically configures the Parallel Computing Toolbox to submit MATLAB
# jobs to the cluster. To use this functionality, you must create a cluster
# configuration file and store it at a location accessible to MATLAB users.
#
# This file is an example of a cluster configuration which MATLAB can
# discover. You can copy and modify this file to make your cluster discoverable.
#
# For more information, including the required format for this file, see
# the online documentation for making a cluster running a third-party
# scheduler discoverable:
# https://mathworks.com/help/matlab-parallel-server/configure-for-cluster-discovery.html

# Copyright 2023 The MathWorks, Inc.

# The name MATLAB will display for the cluster when discovered.
Name = My Slurm cluster

# Maximum number of MATLAB workers a single user can use in a single job.
# This number must not exceed the number of available MATLAB Parallel
# Server licenses.
NumWorkers = 32

# Path to the MATLAB install on the cluster for the workers to use. Note
# the variable "$MATLAB_VERSION_STRING" returns the release number of the
# MATLAB client that is running discovery, e.g. 2023a. If multiple versions
# of MATLAB are installed on the cluster, this allows discovery to select
# the correct installation path. Add a leading "R" or "r" if needed to
# complete the MATLAB version.
ClusterMatlabRoot = /opt/matlab/R"$MATLAB_VERSION_STRING"

# Location where the MATLAB client stores job and task information.
JobStorageLocation = /home/matlabjobs
# If the client and cluster share a filesystem but the client is running
# the Windows operating system and the cluster running a Linux operating
# system, you must specify the JobStorageLocation using a structure by
# commenting out the previous line and uncommenting the following lines.
# The 'windows' and 'unix' fields must correspond to the same folder as
# viewed from each of those operating systems.
#JobStorageLocation.windows = \\organization\home\matlabjobs
#JobStorageLocation.unix = /organization/home/matlabjobs

# Folder that contains the scheduler plugin scripts that describe how
# MATLAB interacts with the scheduler. A property can take different values
# depending on the operating system of the client MATLAB by specifying the
# name of the OS in parentheses.
PluginScriptsLocation (Windows) = \\organization\matlab\pluginscripts
PluginScriptsLocation (Unix) = /organization/matlab/pluginscripts

# The operating system on the cluster. Valid values are 'unix' and 'windows'.
OperatingSystem = unix

# Specify whether client and cluster nodes share JobStorageLocation. To
# configure MATLAB to copy job input and output files to and from the
# cluster using SFTP, set this property to false and specify a value for
# AdditionalProperties.RemoteJobStorageLocation below.
HasSharedFilesystem = true

# Specify whether the cluster uses online licensing.
RequiresOnlineLicensing = false

# LicenseNumber for the workers to use. Specify only if
# RequiresOnlineLicensing is set to true.
#LicenseNumber = 123456

[AdditionalProperties]

# To configure the user's machine to connect to the submission host via
# SSH, uncomment the following line and enter the hostname of the cluster
# machine that has the scheduler utilities to submit jobs.
#ClusterHost = slurm-headnode

# If the user's machine and the cluster nodes do not have a shared file
# system, MATLAB can copy job input and output files to and from the
# cluster using SFTP. To activate this feature, set HasSharedFilesystem
# above to false. Then uncomment the following lines and enter the location
# on the cluster to store job files.
#RemoteJobStorageLocation (Windows) = /home/"$USERNAME"/.matlab/generic_cluster_jobs
#RemoteJobStorageLocation (Unix) = /home/"$USER"/.matlab/generic_cluster_jobs

# Username to log in to ClusterHost with. On Linux and Mac, use the USER
# environment variable. On Windows, use the USERNAME variable.
Username (Unix) = "$USER"
Username (Windows) = "$USERNAME"
37 changes: 37 additions & 0 deletions discover/runDiscovery.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/sh

# Copyright 2023 The MathWorks, Inc.

usage="$(basename "$0") matlabroot [folder] -- run third-party scheduler discovery in MATLAB R2023a onwards
matlabroot - path to the folder where MATLAB is installed
folder - folder to search for cluster configuration files
(defaults to pwd)"

# Print usage
if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ] ; then
echo "$usage"
exit 0
fi

# MATLAB executable to launch
matlabExe="$1/bin/matlab"
if [ ! -f "${matlabExe}" ] ; then
echo "Could not find MATLAB executable at ${matlabExe}"
exit 1
fi

# Folder to run discovery on. If specified, wrap in single-quotes to make a MATLAB charvec.
discoveryFolder="$2"
if [ ! -z "$discoveryFolder" ] ; then
discoveryFolder="'${discoveryFolder}'"
fi

# Command to run in MATLAB
matlabCmd="parallel.cluster.generic.discoverGenericClusters(${discoveryFolder})"

# Arguments to pass to MATLAB
matlabArgs="-nojvm -parallelserver -batch"

# Build and run system command
CMD="\"${matlabExe}\" ${matlabArgs} \"${matlabCmd}\""
eval $CMD
Loading

0 comments on commit 5ce5f91

Please sign in to comment.