forked from SchedMD/slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES
259 lines (249 loc) · 14.8 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
RELEASE NOTES FOR SLURM VERSION 23.02
IMPORTANT NOTES:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
NOTE: If using a backup DBD you must start the primary first to do any
database conversion, the backup will not start until this has happened.
The 23.02 slurmdbd will work with Slurm daemons of version 21.08 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.
Slurm can be upgraded from version 21.08 or 22.05 to version 23.02 without loss
of jobs or other state information. Upgrading directly from an earlier version
of Slurm will result in loss of state information.
All SPANK plugins must be recompiled when upgrading from any Slurm version
prior to 23.02.
NOTE: PMIx v1.x is no longer supported.
HIGHLIGHTS
==========
-- slurmctld - Add new RPC rate limiting feature. This is enabled through
SlurmctldParameters=rl_enable, otherwise disabled by default.
-- Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave the
same. If you were using SIGHUP as a 'lighter' scontrol reconfigure to rotate
logs please update your scripts to use SIGUSR2 instead.
-- Change cloud nodes to show by default. PrivateData=cloud is no longer
needed.
-- sreport - Count planned (FKA reserved) time for jobs running in IGNORE_JOBS
reservations. Previously was lumped into IDLE time.
-- job_container/tmpfs - Support running with an arbitrary list of private
mount points (/tmp and /dev/shm are the default, but not required).
-- job_container/tmpfs - Set more environment variables in InitScript.
-- Make all cgroup directories created by Slurm owned by root. This was the
behavior in cgroup/v2 but not in cgroup/v1 where by default the step
directories ownership were set to the user and group of the job.
-- accounting_storage/mysql - change purge/archive to calculate record ages
based on end time, rather than start or submission times.
-- job_submit/lua - add support for log_user() from slurm_job_modify().
-- Run the following scripts in slurmscriptd instead of slurmctld:
ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
-- Only permit changing log levels with 'srun --slurmd-debug' by root
or SlurmUser.
-- slurmctld will fatal() when reconfiguring the job_submit plugin fails.
-- Add PowerDownOnIdle partition option to power down nodes after nodes become
idle.
-- Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix from
slurmcriptd to Syslog logging. Previously was only happening when logging
to a file.
-- Add purge and archive functionality for job environment and job batch script
records.
-- Extend support for Include files to all "configless" client commands.
-- Make node weight usable for powered down and rebooting nodes.
-- Removed 'launch' plugin.
-- Add "Extra" field to job to store extra information other than a comment.
-- Add usage gathering for AMD (requires ROCM 5.5+) and NVIDIA gpus.
-- Add job's allocated nodes, features, oversubscribe, partition, and
reservation to SLURM_RESUME_FILE output for power saving.
-- Automatically create directories for stdout/stderr output files. Paths may
use %j and related substitution characters as well.
-- Add --tres-per-task to salloc/sbatch/srun.
-- Allow nodefeatures plugin features to work with cloud nodes.
e.g. - Powered down nodes have no active changeable features.
- Nodes can't be changed to other active features until powered down.
- Active changeable features are reset/cleared on power down.
-- Make slurmstepd cgroups constrained by total configured memory from
slurm.conf (NodeName=<> RealMemory=#) instead of total physical memory.
-- node_features/helpers - add support for the OR and parentheses operators
in a --constraint expression.
-- slurmctld will fatal() when [Prolog|Epilog]Slurmctld are defined but are not
executable.
-- Validate node registered active features are a super set of node's
currently active changeable features.
-- On clusters without any PrologFlags options, batch jobs with failed prologs
nolonger generate an output file.
-- Add SLURM_JOB_START_TIME and SLURM_JOB_END_TIME environment variables.
-- Add SuspendExcStates option to slurm.conf to avoid suspending/powering down
specific node states.
-- Add support for DCMI power readings in IPMI plugin.
-- slurmrestd served /slurm/v0.0.39 and /slurmdb/v0.0.39 endpoints had major
changes from prior versions. Almost all schemas have been renamed and
modified. Sites using OpenAPI Generator clients are highly suggested to
upgrade to to using atleast version 6.x due to limitations with prior
versions.
-- Allow for --nodelist to contain more nodes than required by --nodes.
-- Rename "nodes" to "nodes_resume" in SLURM_RESUME_FILE job output.
-- Rename "all_nodes" to "all_nodes_resume" in SLURM_RESUME_FILE output.
-- Add jobcomp/kafka plugin.
-- Add new PreemptParameters=reclaim_licenses option which will allow higher
priority jobs to preempt jobs to free up used licenses. (This is only
enabled for with PreemptModes of CANCEL and REQUEUE, as Slurm cannot
guarantee suspended jobs will release licenses correctly.)
-- hpe/slingshot - add support for the instant-on feature.
-- Add ability to update SuspendExc* parameters with scontrol.
-- Add ability to restore SuspendExc* parameters on restart with slurmctld -R
option.
-- Add ability to clear a GRES specification by setting it to "0" via
'scontrol update job'.
-- Add SLURM_JOB_OVERSUBSCRIBE environment variable for Epilog, Prolog,
EpilogSlurmctld, PrologSlurmctld, and mail ouput.
-- System node down reasons are appended to existing reasons, separated by ':'.
-- New command scrun has been added. scrun acts as an Open Container Initiative
(OCI) runtime proxy to run containers seamlessly via Slurm.
-- Fixed GpuFreqDef option. When set in slurm.conf, it will be used if
--gpu-freq was not explicitly set by the job step.
CONFIGURATION FILE CHANGES (see appropriate man page for details)
=====================================================================
-- job_container.conf - Added "Dirs" option to list desired private mount
points.
-- node_features plugins - invalid users specified for AllowUserBoot will now
result in fatal() rather than just an error.
-- Deprecate AllowedKmemSpace, ConstrainKmemSpace, MaxKmemPercent, and
MinKmemSpace.
-- Allow jobs to queue even if the user is not in AllowGroups when
EnforcePartLimits=no is set. This ensures consistency for all the Partition
access controls, and matches the documented behavior for EnforcePartLimits.
-- Add InfluxDBTimeout parameter to acct_gather.conf.
-- job_container/tmpfs - add support for expanding %h and %n in BasePath.
-- slurm.conf - Removed SlurmctldPlugstack option.
-- Add new SlurmctldParameters=validate_nodeaddr_threads=<number> option to
allow concurrent hostname resolution at slurmctld startup.
-- Add new AccountingStoreFlags=job_extra option to store a job's extra field
in the database.
-- Add new "defer_batch" option to SchedulerParameters to only defer
scheduling for batch jobs.
-- Add new DebugFlags option 'JobComp' to replace 'Elasticsearch'.
-- Add configurable job requeue limit parameter - MaxBatchRequeue - in
slurm.conf to permit changes from the old hard-coded value of 5.
-- helpers.conf - Allow specification of node specific features.
-- helpers.conf - Allow many features to one helper script.
-- job_container/tmpfs - Add "Shared" option to support shared namespaces.
This allows autofs to work with the job_container/tmpfs plugin when enabled.
-- acct_gather.conf - Added EnergyIPMIPowerSensors=Node=DCMI and
Node=DCMI_ENHANCED.
-- Add new "getnameinfo_cache_timeout=<number>" option to
CommunicationParameters to adjust or disable caching the results of
getnameinfo().
-- Add new PrologFlags=ForceRequeueOnFail option to automatically requeue
batch jobs on Prolog failures regardless of the job --requeue setting.
-- Add HealthCheckNodeState=NONDRAINED_IDLE option.
-- Add 'explicit' to Flags in gres.conf. This makes it so the gres is not
automatically added to a job's allocation when --exclusive is used. Note
that this is a per-node flag.
-- Moved the "preempt_" options from SchedulerParameters to PreemptParameters,
and dropped the prefix from the option names. (The old options will still
be parsed for backwards compatibility, but are now undocumented.)
-- Add LaunchParameters=ulimit_pam_adopt, which enables setting RLIMIT_RSS in
adopted processes.
-- Update SwitchParameters=job_vni to enable/disable creating job VNIs for all
jobs, or when a user requests them.
-- Update SwitchParameters=single_node_vni to enable/disable creating single
node vnis for all jobs, or when a user requests them.
-- Add ability to preserve SuspendExc* parameters on reconfig with
ReconfigFlags=KeepPowerSaveSettings.
-- slurmdbd.conf - Add new AllResourcesAbsolute to force all new resources
to be created with the Absolute flag.
-- topology/tree - Add new TopologyParam=SwitchAsNodeRank option to reorder
nodes based on switch layout. This can be useful if the naming convention
for the nodes does not natually map to the network topology.
-- Removed the default setting for GpuFreqDef. If unset, no attempt to change
the GPU frequency will be made if --gpu-freq is not set for the step.
COMMAND CHANGES (see man pages for details)
===========================================
-- sacctmgr - no longer force updates to the AdminComment, Comment, or
SystemComment to lower-case.
-- sinfo - Add -F/--future option to sinfo to display future nodes.
-- sacct - Rename 'Reserved' field to 'Planned' to match sreport and the
nomenclature of the 'Planned' node.
-- scontrol - advanced reservation flag MAINT will no longer replace nodes,
similar to STATIC_ALLOC
-- sbatch - add parsing for #PBS -d and #PBS -w.
-- scontrol show assoc_mgr will show username(uid) instead of uid in
QoS section.
-- Add strigger --draining and -R/--resume options.
-- Change --oversubscribe and --exclusive to be mutually exclusive for job
submission. Job submission commands will now fatal if both are set.
Previously, these options would override each other, with the last one in
the job submission command taking effect.
-- scontrol - Requested TRES and allocated TRES will now always be printed when
showing jobs, instead of one TRES output that was either the requested or
allocated.
-- srun --ntasks-per-core now applies to job and step allocations. Now, use of
--ntasks-per-core=1 implies --cpu-bind=cores and --ntasks-per-core>1 implies
--cpu-bind=threads.
-- salloc/sbatch/srun - Check and abort if ntasks-per-core > threads-per-core.
-- scontrol - Add ResumeAfter=<secs> option to "scontrol update nodename=".
-- Add a new "nodes=" argument to scontrol setdebug to allow the debug level
on the slurmd processes to be temporarily altered.
-- Add a new "nodes=" argument to "scontrol setdebugflags" as well.
-- Make it so scrontab prints client-side the job_submit() err_msg (which can
be set i.e. by using the log_user() function for the lua plugin).
-- scontrol - Reservations will not be allowed to have STATIC_ALLOC or MAINT
flags and REPLACE[_DOWN] flags simultaneously.
-- scontrol - Reservations will only accept one reoccurring flag when
being created or updated.
-- scontrol - A reservation cannot be updated to be reoccurring if it is
already a floating reservation.
-- squeue - removed unused '%s' and 'SelectJobInfo' formats.
-- squeue - align print format for exit and derived codes with that of other
components (<exit_status>:<signal_number>).
-- sacct - Add --array option to expand job arrays and display array tasks on
separate lines.
-- Partial support for '--json' and '--yaml' formated outputs have been
implemented for sacctmgr, sdiag, sinfo, squeue, and scontrol. The resultant
data ouput will be filtered by normal command arguments. Formatting
arguments will continue to be ignored.
-- salloc/sbatch/srun - extended the --nodes syntax to allow for a list of
valid node counts to be allocated to the job. This also supports a "step
count" value (e.g., --nodes=20-100:20 is equivalent to
--nodes=20,40,60,80,100) which can simplify the syntax when the job needs
to scale by a certain "chunk" size.
-- salloc/sbatch/srun - add user requestible vnis with '--network=job_vni'
option.
-- salloc/sbatch/srun - add user requestible single node vnis with the
'--network=single_node_vni' option.
-- sinfo - The '--json' and '--yaml' output format has been changed to include
the same information that could provided by regular text output. The prior
formatted output can now be queried via 'scontrol show nodes --json' or
'scontrol show nodes --yaml'.
API CHANGES
===========
-- job_container plugins - container_p_stepd_create() function signature
replaced uint32_t uid with stepd_step_rec_t* step.
-- gres plugins - gres_g_get_devices() function signature replaced pid_t pid
with stepd_step_rec_t* step.
-- cgroup plugins - task_cgroup_devices_constrain() function signature removed
pid_t pid.
-- task plugins - replace task_p_pre_set_affinity(), task_p_set_affinity(), and
task_p_post_set_affinity() with task_p_pre_launch_priv() like it was back in
slurm 20.11.
-- Allow for concurrent processing of job_submit_g_submit() and
job_submit_g_modify() calls. If your plugin is not capable of concurrent
operation you must add additional locking within your plugin.
-- Removed return value from slurm_list_append().
-- The List and ListIterator types have been removed in favor of list_t and
list_itr_t respectively.
-- burst buffer plugins - add bb_g_build_het_job_script().
bb_g_get_status() - added authenticated UID and GID.
bb_g_run_script() - added job_info argument.
-- burst_buffer.lua - Pass UID and GID to most hooks. Pass job_info (detailed
job information) to many hooks. See etc/burst_buffer.lua.example for a
complete list of changes. WARNING: Backwards compatibility is broken for
slurm_bb_get_status: UID and GID are passed before the variadic arguments.
If UID and GID are not explicitly listed as arguments to
slurm_bb_get_status(), then they will be included in the variadic arguments.
Backwards compatibility is maintained for all other hooks because the new
arguments are passed after the existing arguments.
-- node_features plugins - node_features_p_reboot_weight() function removed.
node_features_p_job_valid() - added parameter feature_list.
node_features_p_job_xlate() - added parameters feature_list and
job_node_bitmap.
-- New data_parser interface with v0.0.39 plugin.