About the model used for extracting candidate pairs #3

z-kun · 2018-01-31T03:15:41Z

Hello jpeyre,
I have read your paper and program, it is a nice idea to import spatial features to visual relation detection.

After reading, I confuse the sentence "To detect and localize such triplets in test images, we assume that the candidate object detections for s and o are given by a de- tector trained with full supervision. Here we use the object detector Faster-RCNN [14] trained on the Visual Relationship Detection training set [31]." in Part 3 of this paper. But, in the section Representing Appearance of Objects, you use Fast-RCNN with VGG16 pre-trained on ImageNet to extract the appearance feature. So you mean that you use the same CNN structure(Fast-RCNN) trained on different datasets in these two different steps?

I just find the "vgg16_fast_rcnn.caffemodel" in the program, but do not find the model trained on Visual Relationship Dataset. I wonder if I misunderstand the paper. Could you tell me some details about the model trained on VRD used for extracting the candidate pairs of objects? Thank you!

jpeyre · 2018-01-31T10:02:32Z

Hi z-kun,
We use the same model both for extracting the candidate objects and computing their appearance features. This model is indeed "vgg16_fast_rcnn.caffemodel", a VGG16 network pre-trained on ImageNet and finetuned on the VRD training set.

z-kun · 2018-01-31T10:29:27Z

get it, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the model used for extracting candidate pairs #3

About the model used for extracting candidate pairs #3

z-kun commented Jan 31, 2018

jpeyre commented Jan 31, 2018

z-kun commented Jan 31, 2018

About the model used for extracting candidate pairs #3

About the model used for extracting candidate pairs #3

Comments

z-kun commented Jan 31, 2018

jpeyre commented Jan 31, 2018

z-kun commented Jan 31, 2018