Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the multi classification problem #2

Open
zorrocai opened this issue Apr 28, 2018 · 11 comments
Open

the multi classification problem #2

zorrocai opened this issue Apr 28, 2018 · 11 comments

Comments

@zorrocai
Copy link

I have wrote the deeplda in pytorch. When it comes to test moudle, classification method confused me a lot. I want to use one to rest method in mulit classification. It seems that your classification is a new interesting method. Could you explain more about it, and what is the theory behind it?

@zorrocai
Copy link
Author

Are your per-class mean hidden representations based on batch size samples? so is your accuracy?

@dmatte
Copy link
Contributor

dmatte commented May 1, 2018

I have wrote the deeplda in pytorch.

Thats great! I have seen that you also opened an issue for getting the gradients of the eigenvalue decomposition in pytorch. Hopefully this is available soon.

When it comes to test moudle, classification method confused me a lot.
I want to use one to rest method in mulit classification. It seems that your classification is a new interesting method. Could you explain more about it, and what is the theory behind it?

One-to-rest is not required as we build on top of Multiclass LDA.

Computing the individual class probabilities itself is also nothing new.
In fact, it is identical to what you do in the classical (none-deep) version of LDA.
You can also find this in the sklearn implementation of LDA
which in turn builds on sklearn's LinearClassifierMixin.

Are your per-class mean hidden representations based on batch size samples? so is your accuracy?

For computing the updates we use batch-statistics
(you could also use running averages similar to what is done for example in batch-normalization).
For evaluating and testing the statistics are re-computed on the entire training set to get more reliable estimates for both, means and covariances.

@zorrocai
Copy link
Author

Thanks for your guides. I have prepared the IELTS test those days. Actually, instead of using the eigenvalue decomposition in LDA first, I tried to learn the projection matrix A directly, with back propagation. And it seems that the results were just good too.

@zorrocai
Copy link
Author

zorrocai commented May 23, 2018

I have trained the AlexNet with lda-eigenvalues loss.But here wasn't any good improvement on train accuracy.
The sample outputs of my code listed below:

epoch 13 avg_train_loss: -0.999989
('LDA-Eigenvalues (Train):', '[0. 0. 0. 0. 0. 0. 0. 0. 0.]')
Ratio min(eigval)/max(eigval): 0.005, Mean(eigvals): 0.000
train accuracy: 12.085143
epoch 14 avg_train_loss: -0.999989
('LDA-Eigenvalues (Train):', '[0. 0. 0. 0. 0. 0. 0. 0. 0.]')
Ratio min(eigval)/max(eigval): 0.006, Mean(eigvals): 0.000
train accuracy: 12.023333
epoch 15 avg_train_loss: -0.999990
('LDA-Eigenvalues (Train):', '[0. 0. 0. 0. 0. 0. 0. 0. 0.]')
Ratio min(eigval)/max(eigval): 0.007, Mean(eigvals): 0.000
train accuracy: 11.959375

I wonder that the poor performance may caused by the bad LDA-Eigenvalues?
or may there have some other tricks?

@zorrocai
Copy link
Author

Actually, I run into such a situation:
S_B/S_W becomes a minus identity matrix druing training.So the eigenvalues are all 1.
this condition may prevent further training.
image
I really don't konw the reason...

@dmatte
Copy link
Contributor

dmatte commented May 28, 2018

Did you try this with our theano version or with your pytorch implementation?

Did you train AlexNet on ImageNet? This woun't work as you have 1000 classes.
This implies that your co-variances in the objective function have a size of 1000 x 1000.
You would need a huge mini-batch-size (1001 * 1000) to get stable estimates for the covariance matrices and the model would not fit into your GPU memory.

@zorrocai
Copy link
Author

In my pytorch implementation, and just train on MNIST dataset.

@zorrocai
Copy link
Author

zorrocai commented May 28, 2018

I found that the wrong position of line S_W += lambdaI caused the problem. I added this line before S_B=S_T - S_W.
After I moved S_W +=lambda
I behind S_B=S_T - S_W , the problem has gone. But it still has poor eigenvalues after one epoch like this:
('LDA-Eigenvalues (Train):', '[-0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 1.68]')
Ratio min(eigval)/max(eigval): -0.007, Mean(eigvals): 0.176

Unlike your theano codes could have much bigger eigenvalues:
LDA-Eigenvalues (Train): [ 5.83 7.17 7.45 8.01 8.67 11.22 11.81 14.82 18.66]
Ratio min(eigval)/max(eigval): 0.312, Mean(eigvals): 10.403

@xuanhanyu
Copy link

Thanks for your guides. I have prepared the IELTS test those days. Actually, instead of using the eigenvalue decomposition in LDA first, I tried to learn the projection matrix A directly, with back propagation. And it seems that the results were just good too.

Eigenvalue decomposition is Non-differentiable in pytorch. what should I do?

@webzerg
Copy link

webzerg commented May 10, 2020

@zorrocai do you mind share your pytorch implementation code? Thanks

@zjyLibra
Copy link

zjyLibra commented Jul 1, 2021

Why theano can’t call the GPU on the server and tried many ways to configure the environment?I want your pytorch implementation code too,Thanks,Can you share it? @zorrocai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants