-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The num_of_steps setting for Inception_v2 #5
Comments
yes, I think my loss got stable after roughly 12h training on 1 GPU.
…On Wed, May 30, 2018 at 3:53 AM ShiAGou ***@***.***> wrote:
First of all, thank you very much. I noticed that 'num_steps' in
'faster_rcnn_inception_resnet_v2_atrous_kitti.config' file is not
specified. Is this mean it would train infinitely? If so, could you share
your experience on how many steps would be enough to have a stable loss?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFw9YUh7Ux-LIY9FHWyabYm_shaZ3fboks5t3k_xgaJpZM4USxY8>
.
|
I have trained it for about 21 hours on one TITAN X GPU with 1.2 steps/second. But my loss still fluctuate between 0 to 1. Did you change any parameters in 'faster_rcnn_inception_resnet_v2_atrous_kitti.config' such as learning rate? It seems from 0 to 900k steps, the learning rate is a constant .0003. I found the training procedure could be significantly slowed down when running eval.sh at the same time. So I did not run eval currently. Will this affect the result? thanks this is my current training loss state:
default config in 'faster_rcnn_inception_resnet_v2_atrous_kitti.config' is
|
From here, looks to me like you are evaluating loss on a per image basis,
which is not a very good accurate proxy for your train loss over the whole
dataset or your validation loss.
I'd recommend looking at some validation metrics on tensorboard to figure
out when to stop.
…On Thu, May 31, 2018 at 1:21 AM ShiAGou ***@***.***> wrote:
I have trained it for about 21 hours on one TITAN X GPU with 1.2
steps/second. But my loss still fluctuate between 0 to 1. Did you change
any parameters in 'faster_rcnn_inception_resnet_v2_atrous_kitti.config'
such as learning rate? thanks
this is my current training loss state:
INFO:tensorflow:global step 95931: loss = 0.4842 (0.827 sec/step)
INFO:tensorflow:global step 95932: loss = 0.2304 (0.831 sec/step)
INFO:tensorflow:global step 95933: loss = 0.6756 (0.824 sec/step)
INFO:tensorflow:global step 95934: loss = 0.5103 (0.829 sec/step)
INFO:tensorflow:global step 95935: loss = 0.3497 (0.820 sec/step)
INFO:tensorflow:global step 95936: loss = 0.3261 (0.829 sec/step)
INFO:tensorflow:global step 95937: loss = 0.3748 (0.823 sec/step)
INFO:tensorflow:global step 95938: loss = 0.1620 (0.826 sec/step)
INFO:tensorflow:global step 95939: loss = 0.3487 (0.828 sec/step)
INFO:tensorflow:global step 95940: loss = 0.3864 (0.823 sec/step)
INFO:tensorflow:global step 95941: loss = 0.1237 (0.827 sec/step)
INFO:tensorflow:global step 95942: loss = 0.4237 (0.827 sec/step)
INFO:tensorflow:global step 95943: loss = 0.2671 (0.841 sec/step)
INFO:tensorflow:global step 95944: loss = 0.5672 (0.873 sec/step)
INFO:tensorflow:global step 95945: loss = 0.2411 (0.889 sec/step)
INFO:tensorflow:global step 95946: loss = 0.3034 (0.876 sec/step)
INFO:tensorflow:global step 95947: loss = 0.0378 (0.883 sec/step)
INFO:tensorflow:global step 95948: loss = 0.2312 (0.876 sec/step)
INFO:tensorflow:global step 95949: loss = 0.1306 (0.855 sec/step)
INFO:tensorflow:global step 95950: loss = 0.3180 (0.818 sec/step)
default config in 'faster_rcnn_inception_resnet_v2_atrous_kitti.config' is
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true
data_augmentation_options {
random_horizontal_flip {
}
}
}
It seems from 0 to 900k steps, the learning rate is a constant .0003?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFw9YXduFeHzmhCdyyi0wKsXRmkI1m7fks5t3329gaJpZM4USxY8>
.
|
First of all, thank you very much. I noticed that 'num_steps' in 'faster_rcnn_inception_resnet_v2_atrous_kitti.config' file is not specified. Is this mean it would train infinitely? If so, could you share your experience on how many steps would be enough to have a stable loss?
The text was updated successfully, but these errors were encountered: