Cannot reproduce ilsvrc2012 validation results #1644

shaibagon · 2014-12-28T17:14:57Z

Hi,
I built caffe-dev and tried "out of the box" bvlc_googlenet model. The top-5 score I get is for ilsvrc2012 validation set is:
Test net output #8: loss3/top-5 = 0.831125
While the expected score (according to wiki) is 0.89.
I suspect I did not prepare my validation set properly.
I used script examples/imagenet/create_imagenet.sh to create the images lmdb setting RESIZE_HEIGHT and RESIZE_WIDTH to 227.

I experience same issue with vgg 19 layers model.

What am I missing here?

The text was updated successfully, but these errors were encountered:

ducha-aiki · 2014-12-28T18:30:58Z

Hi @shaibagon

RESIZE_HEIGHT and RESIZE_WIDTH to 227.

Default option is 256x256, so it might be the cause.

shaibagon · 2014-12-29T08:45:38Z

@ducha_aiki - thank you for your reply.
I re-run it with RESIZE_HEIGHT and RESIZE_WIDTH set to 256. It made no change on the test results. Still getting top-5 ~83%.
What am I missing here?

The output of caffe test -model ./models/bvlc_googlenet/train_val.prototxt -weights ./models/bvlc_googlenet/bvlc_googlenet.caffemodel I get is:

I1229 08:59:31.476357 17712 caffe.cpp:174] Loss: 3.37957
I1229 08:59:31.476372 17712 caffe.cpp:186] loss1/loss1 = 2.48485 (* 0.3 = 0.745455 loss)
I1229 08:59:31.476384 17712 caffe.cpp:186] loss1/top-1 = 0.522318
I1229 08:59:31.476394 17712 caffe.cpp:186] loss1/top-5 = 0.750114
I1229 08:59:31.476408 17712 caffe.cpp:186] loss2/loss1 = 2.25783 (* 0.3 = 0.67735 loss)
I1229 08:59:31.476418 17712 caffe.cpp:186] loss2/top-1 = 0.589818
I1229 08:59:31.476428 17712 caffe.cpp:186] loss2/top-5 = 0.798568
I1229 08:59:31.476440 17712 caffe.cpp:186] loss3/loss3 = 1.95676 (* 1 = 1.95676 loss)
I1229 08:59:31.476451 17712 caffe.cpp:186] loss3/top-1 = 0.645795
I1229 08:59:31.476461 17712 caffe.cpp:186] loss3/top-5 = 0.83275

ducha-aiki · 2014-12-29T09:16:22Z

@shaibagon
it is weird. Here are my log:
https://gist.github.com/ducha-aiki/21f41b1887749b122ca7
Do you have original imagenet 2012 val without any preprocessing?

shaibagon · 2014-12-29T10:23:07Z

I downloaded the images using image URLs. no preprocessing. The images are stored on my device in their original size, only the examples/imagenet/create_imagenet.sh script resizes the images to 256x256. Should I somehow crop the images to aspect ratio 1:1 as preprocessing? what happens to image aspect ratio during resize?

ducha-aiki · 2014-12-29T10:46:05Z

No, looks ok. Have you experienced performance drop also for bvlc_reference_caffenet?
As for the resizing via convert_imageset, I am personally don`t use it, instead store original images using #1239 And do on-the-fly resizing while train/val. But my results in wiki does not differ from BVLC reported performance, so that is not a cause.
Are you sure that you test all images, I mean batch_size * num_iterations == 50 000? Sorry, if it sounds silly, just to make sanity check to be sure. Could you post a log file like I did?

shaibagon · 2014-12-29T11:01:11Z

Due to device capacity issues I use batch size 32 in test time as well as training time.
Moreover, since I downloaded the validation set from image URLs I do not have all 50K images, only about 44K of them. I do not suspect this is a cause of ~5% drop in performance, though...

I just tested the reference caffenet model as well, I get accuracy (top-1) of 52%, as opposed to 57% reported here.

Something is fishy here...

ducha-aiki · 2014-12-29T11:23:01Z

Due to device capacity issues I use batch size 32 in test time as well as training time.

This cannot be a problem, unless you haven`t proportionally increased number of test iterations.

I do not suspect this is a cause of ~5% drop in performance, though...

It could be indeed. You can post somewhere your val.txt (assuming your filenames still are like ILSVRC2012_val_000000*.JPEG) and I can check performance on this list.

shaibagon · 2014-12-29T11:52:30Z

Regarding number of test iterations - I changed the number such that #iteration * batch_size = number of validation examples. Moreover, even when testing on a subset of the validation set, the performance are still roughly 83% (top-5 googlenet) or 52% (top-1 reference caffenet).
I posted my val.txt file in:
https://gist.github.com/shaibagon/4850c225bd7e19f87142

Thank you VERY MUCH for your help.

ducha-aiki · 2014-12-29T13:02:10Z

(bs=6)*(#iter=7331) = 43986 examples (you are unlucky to have so badly factorized number :)
My bvlc_alexnet results are

accuracy = 0.569068
accuracy5 = 0.796894

Some problems with your images, I suppose. Try to sort them by size, may be some of them downloaded with errors and have size ~1Kb or smth like this? Test log is here:
https://gist.github.com/ducha-aiki/27bec59ec6e51c84e53e

shaibagon · 2014-12-29T13:17:50Z

@ducha-aiki - I started sifting through my images. There might be some "download each URL" issues. I'll update if I come to any conclusion here.

shaibagon · 2014-12-29T15:05:48Z

@ducha-aiki - I found it! It turns out downloading non-existant flickr URL results with this picture (and not an error, as I would expect):

Clearing these from my set yields (for bvlc_googlenet):

I1229 14:47:19.535240 31260 caffe.cpp:174] Loss: 2.33781
I1229 14:47:19.535255 31260 caffe.cpp:186] loss1/loss1 = 1.91557 (* 0.3 = 0.57467 loss)
I1229 14:47:19.535266 31260 caffe.cpp:186] loss1/top-1 = 0.5525
I1229 14:47:19.535277 31260 caffe.cpp:186] loss1/top-5 = 0.803125
I1229 14:47:19.535290 31260 caffe.cpp:186] loss2/loss1 = 1.53597 (* 0.3 = 0.460792 loss)
I1229 14:47:19.535300 31260 caffe.cpp:186] loss2/top-1 = 0.625
I1229 14:47:19.535310 31260 caffe.cpp:186] loss2/top-5 = 0.850312
I1229 14:47:19.535321 31260 caffe.cpp:186] loss3/loss3 = 1.30235 (* 1 = 1.30235 loss)
I1229 14:47:19.535331 31260 caffe.cpp:186] loss3/top-1 = 0.68
I1229 14:47:19.535341 31260 caffe.cpp:186] loss3/top-5 = 0.88625

Which is close enough for me.

Thank you very much for your help and patience!

shaibagon closed this as completed Dec 29, 2014

ducha-aiki mentioned this issue Feb 22, 2015

Cannot get the reported performance using bvlc_googlenet.caffemodel in model zoo #1927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce ilsvrc2012 validation results #1644

Cannot reproduce ilsvrc2012 validation results #1644

shaibagon commented Dec 28, 2014

ducha-aiki commented Dec 28, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

shaibagon commented Dec 29, 2014

Cannot reproduce ilsvrc2012 validation results #1644

Cannot reproduce ilsvrc2012 validation results #1644

Comments

shaibagon commented Dec 28, 2014

ducha-aiki commented Dec 28, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

ducha-aiki commented Dec 29, 2014

shaibagon commented Dec 29, 2014

shaibagon commented Dec 29, 2014