Benchmarks #10

ruotianluo · 2017-08-04T21:38:42Z

Cross entropy loss (Cider score on validation set without beam search; 25epochs):
fc 0.92
att2in 0.95
att2in2 0.99
topdown 1.01

(self critical training is in https://github.com/ruotianluo/self-critical.pytorch)
Self-critical training. (Self critical after 25epochs; Suggestion: don't start self critical too late):
att2in 1.12
topdown 1.12

Test split (beam size 5):
cross entropy:
topdown: 1.07

self-critical:
topdown:
Bleu_1: 0.779 Bleu_2: 0.615 Bleu_3: 0.467 Bleu_4: 0.347 METEOR: 0.269 ROUGE_L: 0.561 CIDEr: 1.143
att2in2:
Bleu_1: 0.777 Bleu_2: 0.613 Bleu_3: 0.465 Bleu_4: 0.347 METEOR: 0.267 ROUGE_L: 0.560 CIDEr: 1.156

SJTUzhanglj · 2017-09-27T06:30:02Z

is there any code or options, to show how to train any of these models (topdown, etc) with self-critical algorithm? @ruotianluo

ruotianluo · 2017-09-27T06:42:13Z

It's in my another repository

miracle24 · 2017-09-27T11:42:46Z

Did you fine-tune the CNN when trained the model with cross entropy loss?

ruotianluo · 2017-09-27T11:48:54Z

No.

miracle24 · 2017-09-27T12:06:58Z

Wow. It's unbelievable. I can not achieve that high score without fine-tune when train my own captioning model under cross entropy loss. Most papers I have read will fine-tune the CNN when train the model with cross entropy loss. Is there any tips when train the model with cross entropy?

ruotianluo · 2017-09-27T12:08:41Z

Finetuning is actually worse. It's about how to extract the features, check the self critical sequence training paper.

miracle24 · 2017-09-27T12:24:21Z

I think they means they did not do finetuning when trained the model under RL loss, while they did not mention whether they finetune the CNN when train the model under cross entropy loss.

miracle24 · 2017-09-27T12:25:49Z

I finetnue the CNN under cross entropy loss as neuraltalk2 (Lua version) and I got cider 0.91 on validation set without beamsearch. Then I train the self-critical model without finetuning based on the best pretrained model and I finally got cider almost close result compared with self-critical paper.

ruotianluo · 2017-09-27T12:30:45Z

They didn't fine-tune in both phase. And finetuning may not work as well under attention based model.

miracle24 · 2017-09-27T12:35:39Z

I did not train the attention based model. But I will try. Thank you and your codes. I will start learning pytorch with you code.

ahkarami · 2017-10-07T17:07:23Z

Dear @ruotianluo,
Thank you for your fantastic code. Would you please tell me all of your used parameters for run the train.py code? (In fact, I used your code, as the guidance in the ReadMe file, but when I have used and tested the trained model, I got same result (i.e., same captions) for all of my different test images). It is worth noting that, I have used --language_eval 0, and maybe this wrong parameter caused these obtained results, am I correct?

ruotianluo · 2017-10-07T17:40:49Z

Can you try downloading the pertrained model and evaluate on your test images. It helps me to narrow down the problem.

ahkarami · 2017-10-07T17:51:32Z

Yes, I can download the pre-trained models and use them. The results from pre-Trained models were appropriate and nice; However, the obtained results from my Trained models were same for all of the images. It seems something wrong with my used parameters for training and the trained model produced same caption for all of given images.

ruotianluo · 2017-10-07T17:57:24Z

You should be able to reproduce my result following my instructions, it's really weird.
Anyway which options are not clear to me (most of the options are explained in the opts.py)?

ahkarami · 2017-10-08T10:47:33Z

Thank you very much for your help. The problem has been solved. In fact, I have trained your code on another Synthetic data set, and as a result the error has been occurred. However, when I used your code on MS-COCO data set, the training process hasn't any problem.
Just as another question, would you please kindly tell me the appropriate value of parameters for training? I mean the appropriate values for parameters such as beam_size, rnn_size, num_layers, rnn_type, learning_rate, learning_rate_decay_every, and scheduled_sampling_start.

ruotianluo · 2017-10-08T16:35:41Z

@ahkarami is the previous problem related to my code?
I think it varies from dataset to dataset. Beam size could be 5. The numbers I set are the same as in the readme.

ahkarami · 2017-10-08T17:34:47Z

Dear @ruotianluo,
No, the previous problem related to my data set, and your code is correct. In fact, in my data set the repetitious words are many. Moreover, the length of sentences vary from ~15 up to 90 words. I have changed the parameters of the prepro_labels.py by --max_length = 50 & --word_count_threshold = 2 then after about 40 epochs, the produced results are not same for any given image; However the results were bad and not appropriate. I think still my parameters for training & pre-processing the labels are not appropriate.

xyy19920105 · 2017-11-13T11:46:39Z

Hi @ruotianluo ,
Thank you for your code and benchmark, did you test the adaptive attention on your code?? Could you output the adaptive attention's result??
Thank you again.

ruotianluo · 2017-11-13T14:03:35Z

Actually no. I didn't spend much time on that model.

xyy19920105 · 2017-11-14T09:38:07Z

Thanks for your reply.
Do you think that the adaptive attention model is not good enough as a baseline??

ruotianluo · 2017-11-14T13:31:33Z

It's good, just I couldn't get it work well.

dmitriy-serdyuk · 2018-01-05T00:20:13Z

Could you clarify, which features are used for the results above? resnet152? And does fc stand for ShowTell?

ruotianluo · 2018-01-05T00:54:33Z

@dmitriy-serdyuk it's using res101. and FC stands for the FC model in self critical sequence training paper which can be regarded as a variant of showtell.

chynphh · 2018-03-01T05:34:18Z

Thank you for your fantastic code. I am a beginner, and it helped me a lot.
I have a question about the 'LSTMCore' class in the FCModel.py. Why don't you use the official LSTM model and train it by step, or the LSTMCell model and add a dropout layer on it? Is there any difference between your code and them?

ruotianluo · 2018-03-01T05:37:14Z

The in gate is different.
https://github.com/ruotianluo/ImageCaptioning.pytorch/blob/master/models/FCModel.py#L34

chynphh · 2018-03-01T06:36:01Z

OK, I got it. But why do you make this change? Is there any paper or any research about this?

ruotianluo · 2018-03-01T06:51:40Z

Self-critical Sequence Training for Image Captioning
https://arxiv.org/abs/1612.00563

chynphh · 2018-03-01T06:56:11Z

Thank you very much!

YuanEZhou · 2018-10-24T05:28:18Z

opt.id = 'topdown'
opt.caption_model = 'topdown'
opt.rnn_size = 1000
opt.input_encoding_size = 1000

opt.batch_size = 100
Other configurations follow this repository.

Cross_entropy loss:

Cross_entropy+self-critical: slightly better than the result reported in original paper.

jamiechoi1995 · 2018-11-05T02:36:12Z

opt.id = 'topdown'
opt.caption_model = 'topdown'
opt.rnn_size = 1000
opt.input_encoding_size = 1000

opt.batch_size = 100
Other configurations follow this repository.

Cross_entropy loss:

Cross_entropy+self-critical: slightly better than the result reported in original paper.

@YuanEZhou which feature did you use? the default resnet101 feature or the bottom up feature

YuanEZhou · 2018-11-05T05:27:13Z

bottom up feature

jamiechoi1995 · 2018-11-06T02:37:15Z

bottom up feature

@YuanEZhou may I ask how did you use these features?
Because I have a similar question in this issue: ruotianluo/self-critical.pytorch#66

did you modify the code to incorporate bounding box information? Or just use the default options.

YuanEZhou · 2018-11-06T06:23:51Z

@jamiechoi1995 I use the default options.

jamiechoi1995 · 2018-12-30T12:43:48Z

Adaptive Attention model
learning rate 1e-4
batch size 32
trained for 100 epochs
I use the code in self-critical repo

{'CIDEr': 1.0295328576254532, 'Bleu_4': 0.32367107232015596, 'Bleu_3': 0.4308636494026319, 'Bleu_2': 0.5710839754137301, 'Bleu_1': 0.7375622419883233, 'ROUGE_L': 0.5415854013591195, 'METEOR': 0.2603669044858015, 'SPICE': 0.193603187345227
47}

fawazsammani · 2019-03-11T17:16:44Z

@YuanEZhou can you please share the results.json file you got from the coco caption code which includes all the image ids with their predictions for the validation images? I urgently need it. Your help is highly appreciated

YuanEZhou · 2019-03-12T01:39:48Z

Hi @fawazsammani , I am sorry that I have lost the file.

2033329616 · 2019-03-15T10:03:09Z

when I use the att2in2 pre-trained model to evaluate the coco dataset, the decoder is always output similar sentences, metrics are very bad why?

fawazsammani · 2019-03-15T10:09:57Z

@2033329616 maybe the mistake is in your images. Yesterday, i ran the att2in2 model on the COCO karpathy split validation images, you can run them in the coco caption and see the results, they are identical to the ones posted. (I've already pre-processed the file to include the image ids for evaluation purpose, so you may just run the coco caption code on it directly).
att2in2_results.zip
Regards

YuanEZhou · 2019-03-15T10:50:31Z

@2033329616 You need to download pretrained resnet model from the link in this project.

2033329616 · 2019-03-15T12:16:23Z

@fawazsammani @YuanEZhou , Thanks for your reply, I download the "att2in2_results.zip" and run the coco metrics code, it gets a good result. I have already used the pretrained att2in2 mode in this project, and test it on the karpathy split test cocodataset, but I can't get the correct result, I notice the output sentences are same whatever I change the image or fc and att feature, I have no idea how to solve this problem?

akashprakas · 2019-05-14T11:30:27Z

is there a pretrained model in which the self attention was used?

kakazl · 2019-05-24T06:27:18Z

@fawazsammani @YuanEZhou , Thanks for your reply, I download the "att2in2_results.zip" and run the coco metrics code, it gets a good result. I have already used the pretrained att2in2 mode in this project, and test it on the karpathy split test cocodataset, but I can't get the correct result, I notice the output sentences are same whatever I change the image or fc and att feature, I have no idea how to solve this problem?

i meet the same problem. Are you solving the problem now?

fawazsammani · 2019-05-25T19:22:12Z

Hi @2033329616 and @kakazl . I'm not sure exactly what's the problem in your case. Maybe you used different settings? This is the command i run: pytorch-0.4:py2 "python eval.py --model '/data/att2in2/model-best.pth' --infos_path '/data/att2in2/infos_a2i2-best.pkl' --image_folder '/captiondata' --num_images -1 --beam_size 3 --dump_path 1"
Make sure you place all the images in the folder 'captiondata'. Or create a new folder and change the name in the command. Hope that helps

sssilence · 2019-12-05T14:15:59Z

Hi @2033329616 and @kakazl . I'm not sure exactly what's the problem in your case. Maybe you used different settings? This is the command i run: pytorch-0.4:py2 "python eval.py --model '/data/att2in2/model-best.pth' --infos_path '/data/att2in2/infos_a2i2-best.pkl' --image_folder '/captiondata' --num_images -1 --beam_size 3 --dump_path 1"
Make sure you place all the images in the folder 'captiondata'. Or create a new folder and change the name in the command. Hope that helps

Sorry,when I run:python eval.py --model 'self_cirtical/att2in2/model-best.pth' --infos_path 'self_cirtical/att2in2/infos_a2i2-best.pkl' --image_folder 'data/coco/images/val2014/' --num_images 10,
always occur the error:TypeError: 'int' object is not callable ,on AttModels line 165,batch_size = fc_feats.size(0)
I don't know why.Thank you!

sssilence · 2019-12-05T14:16:43Z

Hi @2033329616 and @kakazl . I'm not sure exactly what's the problem in your case. Maybe you used different settings? This is the command i run: pytorch-0.4:py2 "python eval.py --model '/data/att2in2/model-best.pth' --infos_path '/data/att2in2/infos_a2i2-best.pkl' --image_folder '/captiondata' --num_images -1 --beam_size 3 --dump_path 1"
Make sure you place all the images in the folder 'captiondata'. Or create a new folder and change the name in the command. Hope that helps

Sorry,when I run:python eval.py --model 'self_cirtical/att2in2/model-best.pth' --infos_path 'self_cirtical/att2in2/infos_a2i2-best.pkl' --image_folder 'data/coco/images/val2014/' --num_images 10,
always occur the error:TypeError: 'int' object is not callable ,on AttModels line 165,batch_size = fc_feats.size(0)
I don't know why.Thank you!

@fawazsammani

fawazsammani · 2019-12-06T03:21:53Z

@sssilence are you using Python 2 or 3? I just ran it again and it works. According to your error, your fc_feats is an integer. Are you sure to extracted the features correctly and didn't modify something in the code?

sssilence · 2019-12-06T04:54:34Z

@sssilence are you using Python 2 or 3? I just ran it again and it works. According to your error, your fc_feats is an integer. Are you sure to extracted the features correctly and didn't modify something in the code?

Yeah I used python2.I didn't modify anything in the code.And I used resnet101 extracting the features.Then I modify some code in rval_utils.py: tmp = [torch.from_numpy(_).cuda() if _ is not None else _ for _ in tmp],and I can run python rval.py but I can't run python train successfuly.
Besides, when I finished running eval.py,only these:
cp "data/coco/images/val2014/COCO_val2014_000000316715.jpg" vis/imgs/img40508.jpg
image 4: a group of traffic lights on a city street
cp "data/coco/images/val2014/COCO_val2014_000000278350.jpg" vis/imgs/img40509.jpg
image 5: a man standing in the water with a frisbee
cp "data/coco/images/val2014/COCO_val2014_000000557573.jpg" vis/imgs/img40510.jpg
image 6: a close up of a flower in a street
evaluating validation preformance... 5/40504 (0.000000)
loss: 0.0
there are nothing in eval_results and there are not any score.

Sun-WeiZhen · 2019-12-13T09:11:00Z

Dear @ruotianluo,
Thank you for your fantastic code. Would you please tell me with the following questions,thank you.I have downloaded the pretrained models as readme.
usage: eval.py [-h] --model MODEL [--cnn_model CNN_MODEL] --infos_path
INFOS_PATH [--batch_size BATCH_SIZE] [--num_images NUM_IMAGES]
[--language_eval LANGUAGE_EVAL] [--dump_images DUMP_IMAGES]
[--dump_json DUMP_JSON] [--dump_path DUMP_PATH]
[--sample_max SAMPLE_MAX] [--beam_size BEAM_SIZE]
[--temperature TEMPERATURE] [--image_folder IMAGE_FOLDER]
[--image_root IMAGE_ROOT] [--input_fc_dir INPUT_FC_DIR]
[--input_att_dir INPUT_ATT_DIR]
[--input_label_h5 INPUT_LABEL_H5] [--input_json INPUT_JSON]
[--split SPLIT] [--coco_json COCO_JSON] [--id ID]
eval.py: error: unrecognized arguments: python eval.py

AnupKumarGupta · 2020-01-27T10:29:47Z

Hi everyone. Thanks and kudos to this great repository. I am just a newbie and this repo has helped me a lot. I want to mimic the results of ShowAndTell, ShowAttendAndTell. I have provided the path to the model as mle/fc/model-best.pth but an exception is raised Exception: Caption model not supported: newfc.

I changed the name of caption_model to fc from new_fc, but yet again I encounter an error. Any help will be highly appreciated.

@dmitriy-serdyuk it's using res101. and FC stands for the FC model in self critical sequence training paper which can be regarded as a variant of showtell.

Mollylulu · 2020-03-01T07:23:19Z

Hello, I download the restnet101 folder and move model.pth & infos.pkl files into the layer where eval.py exists, then when I run the eval command as your direction, it just reports the error like the image showing. could you help me figure out where I make mistakes?

ruotianluo · 2020-03-01T14:35:08Z

@Willowlululu i guess you are using python3? This repo only support py2. Try selfcritical. Pytorch

anuragrpatil · 2020-04-18T20:16:09Z

Hi @ruotianluo, Thank you for the great repo! I was wondering is there a pretrained transformer model in the drive link?

ruotianluo · 2020-04-18T20:19:16Z

There is, check out self critical pytorch repo model zoo

anuragrpatil · 2020-04-18T20:39:34Z

@ruotianluo Thank you for the quick response! To check my understanding, the fc_nsc, fc_rl and att2in2 are from the self critical paper and the updown is the Anderson paper. Apologies if I am missing out anything here.

ruotianluo · 2020-04-18T20:41:12Z

https://github.com/ruotianluo/self-critical.pytorch/blob/master/MODEL_ZOO.md

ydyrx-ldm · 2021-06-12T14:44:41Z

@jamiechoi1995

Adaptive Attention model
learning rate 1e-4
batch size 32
trained for 100 epochs
I use the code in self-critical repo

{'CIDEr': 1.0295328576254532, 'Bleu_4': 0.32367107232015596, 'Bleu_3': 0.4308636494026319, 'Bleu_2': 0.5710839754137301, 'Bleu_1': 0.7375622419883233, 'ROUGE_L': 0.5415854013591195, 'METEOR': 0.2603669044858015, 'SPICE': 0.193603187345227
47}

Hi, I also want to use Adaptive Attention. What was your training command at that time? Waiting for your answer

ruotianluo mentioned this issue Sep 5, 2017

some bug found in using #4

Closed

ruotianluo mentioned this issue Jan 11, 2018

Performance #34

Closed

HYPJUDY mentioned this issue Jul 5, 2019

compare CIDEr optimization and training time with BUTD paper ruotianluo/self-critical.pytorch#100

Open

fawazsammani mentioned this issue May 18, 2020

Lower CIDEr score than the paper reported fawazsammani/knowing-when-to-look-adaptive-attention#9

Closed

Benchmarks #10

Benchmarks #10

Comments

ruotianluo commented Aug 4, 2017 • edited Loading

SJTUzhanglj commented Sep 27, 2017

ruotianluo commented Sep 27, 2017

miracle24 commented Sep 27, 2017

ruotianluo commented Sep 27, 2017

miracle24 commented Sep 27, 2017

ruotianluo commented Sep 27, 2017

miracle24 commented Sep 27, 2017

miracle24 commented Sep 27, 2017

ruotianluo commented Sep 27, 2017

miracle24 commented Sep 27, 2017

ahkarami commented Oct 7, 2017

ruotianluo commented Oct 7, 2017

ahkarami commented Oct 7, 2017

ruotianluo commented Oct 7, 2017

ahkarami commented Oct 8, 2017

ruotianluo commented Oct 8, 2017 • edited Loading

ahkarami commented Oct 8, 2017

xyy19920105 commented Nov 13, 2017

ruotianluo commented Nov 13, 2017

xyy19920105 commented Nov 14, 2017

ruotianluo commented Nov 14, 2017

dmitriy-serdyuk commented Jan 5, 2018

ruotianluo commented Jan 5, 2018

chynphh commented Mar 1, 2018

ruotianluo commented Mar 1, 2018

chynphh commented Mar 1, 2018

ruotianluo commented Mar 1, 2018

chynphh commented Mar 1, 2018

YuanEZhou commented Oct 24, 2018 • edited Loading

jamiechoi1995 commented Nov 5, 2018

YuanEZhou commented Nov 5, 2018

jamiechoi1995 commented Nov 6, 2018

YuanEZhou commented Nov 6, 2018

jamiechoi1995 commented Dec 30, 2018

fawazsammani commented Mar 11, 2019 • edited Loading

YuanEZhou commented Mar 12, 2019

2033329616 commented Mar 15, 2019

fawazsammani commented Mar 15, 2019 • edited Loading

YuanEZhou commented Mar 15, 2019

2033329616 commented Mar 15, 2019

akashprakas commented May 14, 2019

kakazl commented May 24, 2019

fawazsammani commented May 25, 2019

sssilence commented Dec 5, 2019

sssilence commented Dec 5, 2019

fawazsammani commented Dec 6, 2019

sssilence commented Dec 6, 2019

Sun-WeiZhen commented Dec 13, 2019

AnupKumarGupta commented Jan 27, 2020

Mollylulu commented Mar 1, 2020

ruotianluo commented Mar 1, 2020

anuragrpatil commented Apr 18, 2020

ruotianluo commented Apr 18, 2020

anuragrpatil commented Apr 18, 2020

ruotianluo commented Apr 18, 2020

ydyrx-ldm commented Jun 12, 2021

ruotianluo commented Aug 4, 2017 •

edited

Loading

ruotianluo commented Oct 8, 2017 •

edited

Loading

YuanEZhou commented Oct 24, 2018 •

edited

Loading

fawazsammani commented Mar 11, 2019 •

edited

Loading

fawazsammani commented Mar 15, 2019 •

edited

Loading