-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
217 lines (155 loc) · 8.63 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
The sqlog package contains a set of scripts useful for creating,
populating, and issuing queries to a SLURM job log database.
COMPONENTS
sqlog The "SLURM Query Log" utility. Provides a single interface to
query jobs from the SLURM job log database and/or current
queue of running jobs.
slurm-joblog Logs completed jobs using SLURM's jobcomp/script interface
to the SLURM job log database and an optional text file.
sqlog-db-util Administrative utility used to create SLURM job log database
and its corresponding users. Also provides an interface to
"backfill" the database using existing SLURM joblog files
created by the jobcomp/filetxt plugin.
sqlog.conf World-readable config file. Contains local configuration for
SQL host, read-only user, and read-only password.
slurm-joblog.conf
Private configuration for slurm-joblog script (also used by
by sqlog-db-util). Contains SQL read-write user and password,
root user passwd (for sqlog-db-util) and a list of hosts
that should have RW access to DB.
CONFIGURATION
For fully-automated operation, both the /etc/slurm/sqlog.conf and
/etc/slurm/slurm-joblog.conf must exist. These files are read
using perl's do() function, so the files can and must be valid perl.
This allows a bit of scripting to get the values if necessary.
(See the sqlog doc directory for examples).
The available variables in each config file include:
sqlog.conf:
SQLHOST SQL server hostname (default = sqlhost)
SQLUSER Read-only user (default = slurm_read)
SQLPASS Read-only password (default = none)
SQLDB DB name (default = slurm)
TRACKNODES Set to 0 to disable per-job node tracking (default = 1)
%FORMATS Hash of format aliases (e.g. "f1" => "jid,name,user,state")
slurm-joblog.conf:
SQLUSER Read-write user (default = slurm)
SQLPASS Read-write password (not set)
SQLROOTPASS DB root password (not set)
@SQLRWHOSTS Read-write hosts (array of hosts to give rw access)
JOBLOGFILE txt joblog location (set if you want to log to a file too)
AUTOCREATE Attempt to create DB if it doesn't yet exist the
first time slurm-joblog is run (default = no).
CREATING JOB LOG DATABASE
Once the config files exist, the following command will create the
SLURM job log database:
sqlog-db-util --create
If you have existing text joblog files you'd like to seed the new
DB with, use
sqlog-db-util --backfill [FILE]...
e.g.
sqlog-db-util --backfill /var/log/slurm/joblog*
If AUTOCREATE is set in slurm-joblog.conf, then sqlog-db-util --create
will be automatically run the first time the database is accessed.
CONVERTING JOB LOG DATABASE
The database schema changed from v0.12 to v0.13 of the sqlog package.
The highest schema version currently running on a system can be
determined from the --info output.
To create tables for the new schema, run:
sqlog-db-util --create
Once created, the slurm-joblog.pl script will detect the new schema
and automatically switch to insert records to the new tables. The sqlog command
will query both schemas for records.
To copy existing data from the old schema to the new schema,
use the --convert option.
Speeding up the conversion:
The new schema tracks the nodes that each job uses so that sqlog queries
involving nodes names return much faster. The data and indicies associated
with this node tracking can significantly slow down the conversion operation
when converting a large number of records. There are two options to speed
this up:
1) Disable node-tracking for all converted jobs via the --notrack option.
2) Delay indexing of converted data via the --delay-index option.
With the --notrack option, no node-tracking data will be stored for jobs
inserted via conversion. As such, if node-tracking is enabled on the
system, such jobs will not return in queries involving node names. Newly
inserted jobs will still have node-tracking data.
With the --delay-index option, node tracking indicies are removed before
data is converted, and they are restored when the conversion completes.
Queries involving node names while there are no indicies will take a very
long time to return on a large database.
For a database on Atlas, which had 580,000 jobs spanning two years the
conversion took:
13 minutes for: sqlog-db-util --convert --notrack
33 minutes for: sqlog-db-util --convert --delay-index
85 minutes for: sqlog-db-util --convert
The recommended method is to use --delay-index.
It's also possible to disable node-tracking in the new schema completely.
To do this, add the following line to the sqlog.conf file.
$TRACKNODES=0;
Number of allocated cores:
The new schema adds a new field to record the number of cores allocated
to a job. This data was not captured in the version 1 schema. However,
on many systems, this core count can be computed. On systems that have the
same number of cores per node and allocate whole nodes to a job, one may
use the --cores-per-node option to specify the number of cores per node.
This --cores-per-node value is multiplied with the node count recorded
in the version 1 schema to determine the number of cores allocated to
the job. For example, to convert from schema version 1 to version 2 on
a machine that has 8 cores per node and allocates whole nodes to jobs,
run the following command:
sqlog-db-util --convert --cores-per-node=8
For all other systems, do not specify --cores-per-node. In this case,
the number of cores allocated will be set to 0. The conversion command
on these systems is simply:
sqlog-db-util --convert
If a mistake is made during conversion, you can drop the version 2 tables
and start from scratch (be very careful to specify '2' and not '1' here):
sqlog-db-util --drop=2
You may issue the --convert command on a live system, however, be
careful to specify the command correctly in this case. The slurm-joblog.pl
script will insert records to the new schema as soon as it is created.
If a mistake is made during conversion, and the version 2 tables must
be dropped and recreated, any records inserted by slurm-joblog.pl will be lost.
After conversion, sqlog may report duplicate records as it finds
matches from both the version 1 and version 2 tables. Once converted,
it's recommended that the version 1 tables be dropped by running the
following command (be very careful to specify '1' and not '2' here):
sqlog-db-util --drop=1
Finally, here is a full example set of commands to create the new schema
and convert records to it:
sqlog-db-util -v --create
sqlog-db-util -v --backup=all schema1_jobs.log
sqlog-db-util -v --convert --delay-index --cores-per-node=8
sqlog-db-util -v --drop=1
BACKING UP AND PRUNING THE DATABASE
It is possible to dump records from the job log database into a text
file, which can then be read in via --backfill. This is useful to
capture a text file backup of the logs. One must specify the time
period as either "all", "DATE", or "DATE..DATE", to dump all jobs,
jobs before a given date, and jobs that started between two dates,
respectively. DATE should be specified with the 'YYYY-MM-DD HH:MM:SS'
format, e.g.,
sqlog-db-util -v --backup='2009-01-01 00:00:00'..'2009-02-01 00:00:00'\
logs.txt
One utility of this backup option is to share job log records with
others potentially outside of the organization. Typically, one would
like to protect user and job names when sharing such information.
For this, an --obfuscate option is available which dumps records and
modifies user names to be of the form "user_X", userids to match "X",
and job names to be of the form "job_Y", where X and Y are numbers.
Finally, over a long period of time, the database may gather so many
records that is slows down significantly. A --prune option is available
to remove old records. One specifies a date, and all jobs which started
before that date will be removed from the database and written to a file
name specified by the user, e.g.,
sqlog-db-util -v --prune='2007-01-01 00:00:00' pre2007.log
ENABLE JOB LOGGING
To enable the SLURM job log database, the following configuration
options must be set in slurm.conf:
JobCompType = jobcomp/script
JobCompLoc = /usr/libexec/sqlog/slurm-joblog
Adjust the path if the sqlog RPM was installed with a different PREFIX.
This has only been tested on SLURM 1.2.10 or greater.
Restart slurmctld and slurm-joblog will begin logging jobs as they
complete.
$Id$