-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compare CIDEr optimization and training time with BUTD paper #100
Comments
|
|
2 it's not about turning it on or off. The current way of implementation, in order to use scheduled sampling, is not using the default implementation of taking a whole sequence as LSTM input, which may be faster. |
Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, thanks for your contribution! I have several questions:
I notice in Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering paper mentioned they complete CIDEr optimization in a single epoch as follows:
I wonder why don't you implement this method/trick since it would save huge amount of time! Do you have a plan on it? I haven't figured out the meaning of "take the captions in the decoded beam as a sample set" and how to implement it. Could you throw some light on it?
As for the time, BUTD paper says
Using the BUTD model in this repo, training using 4 Telsa M40 GPUs takes around 9 hours and 40 minutes. (mine: 30 epoches cross entropy loss training with 4 M40 VS BUTD paper: 60 epoches cross entropy loss training + less than 1h for 1 epoch CIDEr optimization with 2 TitanX).
I think the original caffe implementation is much faster. Do you have any idea about it?
FYI (and others who might interested in the complete commands to get comparable BUTD model results), my training details are as follows:
# Training command: CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --id topdown --caption_model topdown --input_json data/cocotalk.json --input_label_h5 data/cocotalk_label.h5 --batch_size 100 --learning_rate 0.001 --learning_rate_decay_start 0 --checkpoint_path log_topdown --save_checkpoint_every 1100 --val_images_use 5000 --max_epochs 30 --rnn_size 1000 --input_encoding_size 1000 --att_feat_size 2048 --att_hid_size 512 --language_eval 1 --scheduled_sampling_start 0 --use_bn 1 --learning_rate_decay_every 4
I add
--use_bn 1
refer to here.I set the learning rate and optimization in the way a little different (
--learning_rate 0.001 --learning_rate_decay_every 4
) from #31 and ruotianluo/ImageCaptioning.pytorch#10 (they got good results but I didn't so I try to use larger learning rate). The performance of several models under my setting isYou mentioned in README that
But I only got slightly improvement (<1%) in a few metric in my experiment, is it reasonable and do you know why?The BUTD use this learning rate schedule:
Could you tell me how to set it with your code?
In addtion to set
--learning_rate 0.01 --optim sgdmom
andchange
self-critical.pytorch/misc/utils.py
Lines 168 to 169 in 8118670
to
what else should I modify?
The text was updated successfully, but these errors were encountered: