-
Notifications
You must be signed in to change notification settings - Fork 83
Description
If training ends, trial downloads all the indexes, but invoke rule ends prematurely. See logs below.
The training ran for 90 steps, but rule concluded at step 60 .
[2020-01-08 22:57:14.339 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 INFO trial.py:197] Training has ended, will refresh one final time in 1 sec.
[2020-01-08 22:57:15.361 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 DEBUG index_reader.py:310] Loaded Index Files: upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198/index/000000000/000000000070_worker_0.json,upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198/index/000000000/000000000080_worker_0.json,upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198/index/000000000/000000000090_worker_0.json
[2020-01-08 22:57:15.361 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 INFO trial.py:209] Loaded all steps
[2020-01-08 22:57:15.361 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 DEBUG trial.py:211] Training Has Ended : last_complete_step was: 60
[2020-01-08 22:57:15.361 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 DEBUG trial.py:213] Training Has Ended : last_index_token was: upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198/index/000000000/000000000060_worker_0.json
[2020-01-08 22:57:15.361 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 INFO invoker.py:36] Looking for step 61 of mode GLOBAL and reached end of training. Max step available is 60
[2020-01-08 22:57:15.362 /codebuild/output/src046/src/github.com/awslabs/sagemaker-debugger-rules/tests/analysis/invoker.py_s3://smdebugcodebuildtest/upload/20200108_223713/a78b5eb/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578524198.2377198 INFO invoker.py:40] Ending execution of rule LossNotDecreasing with step=60