Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIGITS image classification view does not correctly handle output of FCNs #1492

Open
AliaMYH opened this issue Mar 4, 2017 · 16 comments
Open
Labels

Comments

@AliaMYH
Copy link

AliaMYH commented Mar 4, 2017

I've trained a network and the validation accuracy is very good for it, almost 100 percent. However when I come to 'classify many', using the val.txt for the dataset, the accuracy returned is terrible. Only one of the classes is predicted every time. I'm not exactly sure what the problem is.
My validation set is 20% of my training set, I've used the split option in digits.

screen shot 2017-03-04 at 6 48 01 pm
screen shot 2017-03-04 at 6 48 14 pm

@AliaMYH
Copy link
Author

AliaMYH commented Mar 4, 2017

I tried using mean pixel as recommended by #625 , but the same thing is happening.

Classify One gives correct results.

@AliaMYH
Copy link
Author

AliaMYH commented Mar 4, 2017

@samansarraf According to your comment on #625, am I meant to preprocess all my data outside of digits before training?

@samansarraf
Copy link

@AliaMYH your understanding is correct. To solve the issue at that time , I came up with the idea to preprocess my data and made them totally independent from classify many preprocessing module . There should be something on their codes that is not working in some cases. After I preprocessed data outside (including image resizing ...) the classify many exactly produced the same results as I was getting during training and validation . Don't give up ! you are pretty close to it .

@AliaMYH
Copy link
Author

AliaMYH commented Mar 4, 2017

A curious thing though is that I use the exact same dataset and cropping with two different networks and one gave the above results, while the other one worked perfectly. The one with the above results is a fine-tuned Squeezenet, and the one that gave proper results was a fine-tuned AlexNet which I trained for comparison. I'm not exactly sure why this is.

Also, does this mean that it's only the displayed results that are incorrect or the entire training of the network.. considering that the 'Classify One' results are correct. Is the deploy.prototxt what is used during the inference?

@samansarraf @gheinrich

@samansarraf
Copy link

@AliaMYH No , the entire training (including validation) process is correct. Only classify many has some preprocessing problem in my opinion. I had the same kind of problem with GoogleNet so I what I did I cropped the data before training and used the cropped images for both training and classify many module then it worked.

@AliaMYH
Copy link
Author

AliaMYH commented Mar 5, 2017

I agree as well. @gheinrich Can you confirm this?

@gheinrich
Copy link
Contributor

@AliaMYH are you saying that the standard Alexnet provided in DIGITS gives proper result whether the implement of SqueezeNet you're using doesn't? It might be a case of DIGITS being confused by the output of the network. Are you using a single SoftMax output in SqueezeNet?

As a side note, if small differences in pre-processing lead to wildly different accuracy, that's a sign of overfit. You'd have to see how well the model generalized to unseen samples.

@gheinrich
Copy link
Contributor

@AliaMYH you might want to verify the shape of the network output. It is probably a case of the network outputting one more dimension than DIGITS expect, which would confuse the classify many path.

@gheinrich
Copy link
Contributor

Any feedback please @AliaMYH ?

@SlipknotTN
Copy link

Me and my colleagues have noticed the same problem @gheinrich , SqueezeNet model (we have tested 1.1) outputs one additional dimension causing the problem on the confusion matrix.
You can fix the problem at inference.py level b2062c6 or in classification/views.py dbcb9ed but the first solution should be better, probably it is possible find other solutions if we understand why SqueezeNet has this behaviour.
We could review the problem and make a PR if you like.

@gheinrich
Copy link
Contributor

I had never looked at SqueezeNet before. The reason you get more dimensions is because this is a fully convolutional network, unlike the typical classification CNN (e.g. Alexnet) which has a fully convolutional feature extractor and a classifier that is made of fully-connected layers. Therefore, SqueezeNet produces a spatial output while DIGITS expects a flattened probability distribution of classes. I think it's better to fix DIGITS in classification/views.py. We could for example collapse the dimensions that have a cardinality of one with np.squeeze. If you are willing to take this up @SlipknotTN it would be very much appreciated, thanks in advance!

@gheinrich gheinrich changed the title Classify Many gives much worse result that validation accuracy DIGITS image classification view does not correctly handle output of FCNs Mar 14, 2017
@nollimahere
Copy link

nollimahere commented Mar 29, 2017

Hi, I created an account just to make this comment so I apologize that I am appearing out of the blue..

My peers and I work with squeezenet and alexnet on ubuntu14.04 package 'digits 4.0.0-1', and I've found through my testing that the proposed change to inference.py in b2062c6 allow us to run the 'classify many' web routine without issues on either type of network.

If, however, we make the change proposed to classification/views.py in pull #1536 then we are not able to run 'classify many' on alexnet or any of the other typical classification CNN's. Similarly, the changes commited in dbcb9ed also break the 'classify many' routine on alexnet.

@SlipknotTN
Copy link

Did you use the Alexnet definition present by default in DIGITS or other definitions? I'd like to try the exact version that you use.
The proposed fixes operates at different levels. The change on inference.py fixes the output earlier, the others only when you run classify_many. I made the PR following @gheinrich suggestion, so i put the fix in classification/views.py, like dbcb9ed but with more errors checks. I have tested the PR with SqueezeNet and GoogleNet, but probably a model like Alexnet with final FullyConnected breaks the fix.
Just be precise GoogleNet works well also before the patch, SqueezeNet needs the patch, but I didn't test Alexnet in any form.

@nollimahere
Copy link

Most recently we have been using the vanilla Alexnet from Digits and that is what I tested in our environment when I made that comment.

@SlipknotTN
Copy link

Mhm... I have tested the PR with Alexnet from DIGITS and it works. Did you try the entire branch https://github.com/cynnyx/DIGITS/tree/fcn-fix-pr or only to port the fix on DIGITS4?
Which error do you encounter? All the classifications assigned to the first column?

Meanwhile we have added to the PR the same fix for the TOPN category function.

@nollimahere
Copy link

nollimahere commented Apr 5, 2017

okay - the issue I was having was not a relic of Digits-4. I have included the 2nd commit made by @belalessandro in your pull #1536 and that 2nd commit correctly handled the 'classify many' web routine without any other changes needed in the inference.py file.

Previously we were still only getting a single class prediction in the confusion matrix. So, for what it's worth, I believe that this commit works great and I look forward to seeing it downstream in the Ubuntu repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants