50% Prediction Errors at "100%" accuracy #44

Dezmon · 2015-04-03T14:58:53Z

When training an AlexNet with two classes of 5,000 images per each class in separate training and validation directories (total 20,000 images). The training and val loss values drop to near zero and the accuracy goes to an unbelievable 100%. Doing single image tests from the validation set for the second class is almost always an incorrect prediction. Single image tests from the validation set for the first class is always a correct prediction. Am I misunderstanding what is being reported as "Accuracy"? the images are png files that are approximately 800x600.

I have gotten the same result with much smaller subset of the data 1000 image per class.

lukeyeager · 2015-04-03T16:34:16Z

I'm not sure I have enough information to help you. I'm not saying it couldn't be a bug in DIGITS, but it certainly sounds like you may have set up your datasets incorrectly. Can you give me some more detail on how you created your dataset? I assume that you've looked at the instructions here?

Dezmon · 2015-04-03T17:17:00Z

I looked over the instructions and have run both examples of LeNet on MNIST and AlexNet on imagenet data, and they worked the way they are supposed to.

I'm happy to believe it is me, but something does seem off with the accuracy reported. I have image sequences (medical, video like, all very similar in appearance) the sequences are labeled as class one or two. I split the sequences into a training and val group (half in each). Finally the sequences are split into individual frames and these are my images. Images in the training set are strictly from training sequences and images in the validation set are strictly from the validation sequence.

I guess my issue boils down to after training, the DIGITS graphs is showing 100% accuracy but when I test individual images from the validation set I get mis classifications.

Update: I Have now been running with a batch size of 1 as opposed to 'default' and so far training looks a lot more normal (ie it is not converging yet, val loss is effectively constant, training loss is oscillating, and accuracy is a little over 50%) maybe this related to have a large N compared to the number of classes?

lukeyeager · 2015-04-03T17:24:32Z

maybe this related to have a large N compared to the number of classes?

That does seem to make sense. Let me know if that fixes the problem for you - I could add a check to save others from this problem in the future.

something does seem off with the accuracy reported

I agree. I'll try to look into it. Hopefully today, if I find the time.

Dezmon · 2015-04-03T17:29:28Z

Great, thank you. I won't have a result with my data for a couple days. But I can also try an reproduce the problem with a sub-set of the imageNet data if that would be helpful.

drozdvadym · 2015-04-05T16:29:45Z

@Dezmon
are you training network with the same sizes (for example 256x256)?

Dezmon · 2015-04-05T21:34:09Z

Yes, the images get scaled/padded to 256x256 same as with the ImageNet data. Though I am relying on DIGITS to do that for me, but it works fine from the ImageNet JPG's (and when I test single images from my data). So I'm assuming that part is ok.

lukeyeager · 2015-04-06T17:13:48Z

Hmm, I may have messed up my data manipulation for testing somewhere (see BVLC/caffe#2255). I'll get back to you on this.

lukeyeager · 2015-04-07T01:15:08Z

@Dezmon, I just upgraded to a newer version of caffe and changed the way that I do image preprocessing. Will you upgrade your DIGITS and NVIDIA/caffe installations, and then see if that fixes the issue for you?

Dezmon · 2015-04-07T16:01:17Z

I did a fresh install and build of Caffe (NVIDAs) and DIGITS, I rebuilt the dataset from the raw PNGs and I'm still seeing the problem. That said I think it has todo with batch size (and some lack of basic understanding on my part). With batch sizes other than network default I get very different behavior, mostly the loss explodes which is not a great outcome but at least is understandable.

thatguymike · 2015-04-07T16:17:57Z

Can you post your DB build page with the distribution of classes? This looks like an overfit problem of some sort.

Dezmon · 2015-04-07T17:01:35Z

Sure I'll post it and yes the data is a little unbalanced, but how would overfitting drive the validation accuracy to 100% and the validation loss to 0? Training I would understand but validation doesn't make sense to me, but I am new to ML and open to suggestions.

I have now reproduced it with ImageNet Data. I took two classes from the full data set (n04404412 and n04409515) which are separated into training (2600) and test (26) directories. I trained using defaults for alexnet and got the same behavior. 100% accuracy shown but testing individual image from the validation directory give mixed results.

Here is the training for the two class imagenet data:

Here is my data:

thatguymike · 2015-04-07T17:19:37Z

That is a tiny amount of data for a pretty large network. You are going to overfit quickly. Better would be to attempt finetuning from a fully trained. More importantly, you have a TINY amount of validation images, generally we shoot for >10%, more like 25% of the number of training images.

Dezmon · 2015-04-07T17:47:59Z

Hi Mike, the little set was just to show another example of the problem. So it could be reproduced on a standard image set.

For my data (the first training plot in this thread) I am using 22,838 training images and 25,549 validation image (much more than 25%) per class. Is this still a tiny amount of data? I thought ImageNet used only 1,000 per class

thatguymike · 2015-04-07T17:59:56Z

That should be working better. My hunch is still that you are overfitting your data. ImageNet has ~1000 images per class, but ~1.2M base training images. Still, I would expect better performance. We for example, we have taking Pascal VOC crops and trained on those from scratch successfully, but starting from a pretrained network trained on AlexNet/CaffeNet does produce better overall results.

Let's look at batch sizes and learning rate carefully. Generally if you mess with the batch size you also need to adjust your learning rate and decays. Alex K talks about this in Section 5 of his "One Weird Trick" paper.

Still, you are getting high traning accuracy and your 2 loss curves look correct. Your training and validation sets have no overlap in samples, correct?

Dezmon · 2015-04-07T18:11:37Z

That is correct. I'm not expecting this to work all that well and I'll add a lot more data when I get my understanding of the tools worked out a little better. I'm just trying to figure out the high reported accuracy and subsequent poor single image performance for prediction. I will re-read his paper.

Dezmon · 2015-04-07T18:54:38Z

You are correct, adjusting the learning rate up by his recommended sqrt(k) gives very different network performance (exploding training loss, woohoo :/ ). Should I close this? Since the problem appears to only comes up with pathologically un/under-trained networks that for some reason are reporting low validation loss and 100% accuracy.

Thank you both very much for your help.

lukeyeager · 2015-04-07T18:58:39Z

No problem. I'll look into handling the learning rate vs. batch size adjustment automatically in the future.

Fixes inference bug that was at least partially the cause of #44. Check the classification result in test_one_image

lukeyeager added the bug label Apr 6, 2015

lukeyeager added this to the 1.0.2 milestone Apr 6, 2015

lukeyeager removed the bug label Apr 7, 2015

lukeyeager removed this from the 1.0.2 milestone Apr 7, 2015

lukeyeager closed this as completed Apr 7, 2015

lukeyeager mentioned this issue Apr 7, 2015

Adjust learning rate when batch size changes #51

Open

lukeyeager added a commit that referenced this issue Apr 8, 2015

Stop changing scale in transformer

0bfe47e

Fixes inference bug that was at least partially the cause of #44. Check the classification result in test_one_image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

50% Prediction Errors at "100%" accuracy #44

50% Prediction Errors at "100%" accuracy #44

Dezmon commented Apr 3, 2015

lukeyeager commented Apr 3, 2015

Dezmon commented Apr 3, 2015

lukeyeager commented Apr 3, 2015

Dezmon commented Apr 3, 2015

drozdvadym commented Apr 5, 2015

Dezmon commented Apr 5, 2015

lukeyeager commented Apr 6, 2015

lukeyeager commented Apr 7, 2015

Dezmon commented Apr 7, 2015

thatguymike commented Apr 7, 2015

Dezmon commented Apr 7, 2015

thatguymike commented Apr 7, 2015

Dezmon commented Apr 7, 2015

thatguymike commented Apr 7, 2015

Dezmon commented Apr 7, 2015

Dezmon commented Apr 7, 2015

lukeyeager commented Apr 7, 2015

50% Prediction Errors at "100%" accuracy #44

50% Prediction Errors at "100%" accuracy #44

Comments

Dezmon commented Apr 3, 2015

lukeyeager commented Apr 3, 2015

Dezmon commented Apr 3, 2015

lukeyeager commented Apr 3, 2015

Dezmon commented Apr 3, 2015

drozdvadym commented Apr 5, 2015

Dezmon commented Apr 5, 2015

lukeyeager commented Apr 6, 2015

lukeyeager commented Apr 7, 2015

Dezmon commented Apr 7, 2015

thatguymike commented Apr 7, 2015

Dezmon commented Apr 7, 2015

thatguymike commented Apr 7, 2015

Dezmon commented Apr 7, 2015

thatguymike commented Apr 7, 2015

Dezmon commented Apr 7, 2015

Dezmon commented Apr 7, 2015

lukeyeager commented Apr 7, 2015