@@ -8,6 +8,7 @@ These objects exist across all frameworks.
88- [ Collection] ( #collection )
99- [ SaveConfig] ( #saveconfig )
1010- [ ReductionConfig] ( #reductionconfig )
11+ - [ Environment Variables] ( #environment-variables )
1112
1213## Glossary
1314
@@ -244,3 +245,107 @@ For example,
244245` ReductionConfig(reductions=['std', 'variance'], abs_reductions=['mean'], norms=['l1']) `
245246
246247will return the standard deviation and variance, the mean of the absolute value, and the l1 norm.
248+
249+
250+ ---
251+
252+ ## Environment Variables
253+
254+ #### ` USE_SMDEBUG ` :
255+
256+ Setting this variable to 0 turns off the hook that is created by default. This can be used
257+ if the user doesn't want to use SageMaker Debugger.
258+
259+ #### ` SMDEBUG_CONFIG_FILE_PATH ` :
260+
261+ Contains the path to the JSON file that describes the smdebug hook.
262+
263+ At the minimum, the JSON config should contain the path where smdebug should output tensors.
264+ Example:
265+
266+ ` { "LocalPath": "/my/smdebug_hook/path" } `
267+
268+ In SageMaker environment, this path is set to point to a pre-defined location containing a valid JSON.
269+ In non-SageMaker environment, SageMaker-Debugger is not used if this environment variable is not set and
270+ a hook is not created manually.
271+
272+ Sample JSON from which a hook can be created:
273+ ``` json
274+ {
275+ "LocalPath" : " /my/smdebug_hook/path" ,
276+ "HookParameters" : {
277+ "save_all" : false ,
278+ "include_regex" : " regex1,regex2" ,
279+ "save_interval" : " 100" ,
280+ "save_steps" : " 1,2,3,4" ,
281+ "start_step" : " 1" ,
282+ "end_step" : " 1000000" ,
283+ "reductions" : " min,max,mean"
284+ },
285+ "CollectionConfigurations" : [
286+ {
287+ "CollectionName" : " collection_obj_name1" ,
288+ "CollectionParameters" : {
289+ "include_regex" : " regexe5*" ,
290+ "save_interval" : 100 ,
291+ "save_steps" : " 1,2,3" ,
292+ "start_step" : 1 ,
293+ "reductions" : " min"
294+ }
295+ },
296+ ]
297+ }
298+
299+ ```
300+
301+ #### ` TENSORBOARD_CONFIG_FILE_PATH ` :
302+
303+ Contains the path to the JSON file that specifies where TensorBoard artifacts need to
304+ be placed.
305+
306+ Sample JSON file:
307+
308+ ` { "LocalPath": "/my/tensorboard/path" } `
309+
310+ In SageMaker environment, the presence of this JSON is necessary to log any Tensorboard artifact.
311+ By default, this path is set to point to a pre-defined location in SageMaker.
312+
313+ tensorboard_dir can also be passed while creating the hook [ Creating a hook] (###Hook from Python) using the API or
314+ in the JSON specified in SMDEBUG_CONFIG_FILE_PATH. For this, export_tensorboard should be set to True.
315+ This option to set tensorboard_dir is available in both, SageMaker and non-SageMaker environments.
316+
317+
318+ #### ` CHECKPOINT_CONFIG_FILE_PATH ` :
319+
320+ Contains the path to the JSON file that specifies where training checkpoints need to
321+ be placed. This is used in the context of spot training.
322+
323+ Sample JSON file:
324+
325+ ` { "LocalPath": "/my/checkpoint/path" } `
326+
327+ In SageMaker environment, the presence of this JSON is necessary to save checkpoints.
328+ By default, this path is set to point to a pre-defined location in SageMaker.
329+
330+
331+ #### ` SAGEMAKER_METRICS_DIRECTORY ` :
332+
333+ Contains the path to the directory where metrics will be recorded for consumption by SageMaker Metrics.
334+ This is relevant only in SageMaker environment, where this variable points to a pre-defined location.
335+
336+
337+ #### ` TRAINING_END_DELAY_REFRESH ` :
338+
339+ During analysis, a [ trial] ( analysis.md ) is created to query for tensors from a specified directory. This
340+ directory contains collections, events, and index files. This environment variable
341+ specifies how many seconds to wait before refreshing the index files to check if training has ended
342+ and the tensor is available. By default value, this value is set to 1.
343+
344+
345+ #### ` INCOMPLETE_STEP_WAIT_WINDOW ` :
346+
347+ During analysis, a [ trial] ( analysis.md ) is created to query for tensors from a specified directory. This
348+ directory contains collections, events, and index files. A trial checks to see if a step
349+ specified in the smdebug hook has been completed. This environment variable
350+ specifies the maximum number of incomplete steps that the trial will wait for before marking
351+ half of them as complete. Default: 1000
0 commit comments