-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pruning possibilities at inner_product_layer #4294
base: master
Are you sure you want to change the base?
Conversation
c8166a4
to
81c9127
Compare
b5ed779
to
ac154d3
Compare
@@ -915,6 +916,11 @@ message PowerParameter { | |||
optional float shift = 3 [default = 0.0]; | |||
} | |||
|
|||
message PruningParameter { | |||
// Pruning coefficient for deep compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to document what this parameter does here. The current comment adds no information. It looks like it's the fraction of weights to keep, sorted by absolute value?
Thanks for the PR -- it looks like this is most of what is needed to implement weight compression. It looks like the current PR doesn't actually compress any data. The weights are dropped at load time, not save time. Also, it looks like the weights that are pruned is fixed -- the mask never changes? Does that mean that the intent is to only add this to a fully trained model at deploy time? I don't quite understand the point of this PR. It looks like it will always be slower (from the extra mask multiply) and have a lower accuracy, but with no improvement in final model size (actual size on disk). It's missing the part where you make the model file smaller because some weights are now 0. Was that going to be in a separate PR? |
I don't see any unit tests. |
Thank you for your review, sorry for the messy PR, it's my first PR ever (So I'm not really confident with the process and Git neither)... I will to take more care next time, and add unit test aswell. You are right about the parameter, gonna change the comment. In the publications they have a better accuracy around ~70% pruning (act like a form of regularization), also at the end of the method we can use the sparsity to do the compression at the deployment (with sparse GEMM) which is coming, should I have implemented it in the same PR ? |
And yes this is suppose to be used on an already trained model, and the masks don't changes after the pruning during the training, as there is no rules about how the pruning parameter is suppose to change. |
No, it's good to split up PRs in small units of functionality; that makes them easier to review. |
In practice to run a sufficiently pruned model, it might be worth just giving the input W as three blobs for the CSR representation, and calling mkl_csrmm on CPU/cusparseScsrmm on GPU. That's what the deep compression papers do when reporting their speedups for inference time (Appendix A in https://arxiv.org/pdf/1510.00149.pdf). You can do the sparsification + conversion from dense to sparse in a few lines of PyCaffe by masking the original W, calling |
So you think it's better to do the retraining directly on sparse representation ? I was afraid it doesn't give a lot of flexibility for playing with pruning. Also I'd to implement dropconnect later and I thought it would fit well with the already done mask. While the slowdown of the mask is almost unoticeable (but could be important with a not-enough-sparse model and csrmm), I thought about doing the conversion only for deploy. I could do one way or another, depend on what the community prefer, let me know. |
Ah, I was just talking about replicating the speedups of the model. For replicating that paper, the way @songhan did it IIRC was adding a |
Oh, ok got it. Is that important to replicate this way ? I don't feel confident about changing the whole Blob::update just for adding this feature to InnerProduct (I don't intend to add it to convolution layers as the result is far less interesting). |
@ajtulloch I'm trying to implement what you suggested. The cusparseScsrmm(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
int k,
int nnz,
const float *alpha,
const cusparseMatDescr_t descrA,
const float *csrValA,
const int *csrRowPtrA,
const int *csrColIndA,
const float *B,
int ldb,
const float *beta,
float *C,
int ldc) I don't know what to use for On my pycaffe code I have the following: sparse=csr_matrix(net.params[conv][0].data)
add_blob(sparse.data) #csrValA
add_blob(sparse.indices) #csrColIndA
add_blob(sparse.indptr) #csrRowPtrA Could you give me an example on what to call on my SparseInnerProductlayer.cu? P.S. What function would I call on cudnn_conv_layer.cu ? Thank you very much. |
This may not be what you need here, but I've fixed and merged an old PR with SparseInnerProduct in it, along with a larger set of changes for sparse computations on both CPU and GPU, see https://github.com/beniz/caffe/blob/master_dd_integ_sparse/src/caffe/layers/sparse_inner_product_layer.cu I'm interested in any potential improvement to this sparse layer, though we already use it fine in many tasks. |
@beniz that's perfect, that's exactly what I'm talking about. Have you considered PR'ing that so it's more visible and maybe gets merged? It's a very useful layer, and that implementation looks really nice. |
@ajtulloch I've tested the waters for a PR here #2364 (comment) but until now, no real feedback. For a full list of functionalities, see https://github.com/beniz/deepdetect/pull/142. The implementation originates from #2364 and @alemagnani, I've rebased and added support for MemorySparseDataLayer and batches. I'd be happy to PR it because my fear is that it interferes with later Caffe changes and takes time to maintain on our own branch. Though we're committed to it, this is going into produciton. I'm a bit too busy immediately, so if this is helpful now and someone wants to PR quickly, please do. |
Are you sure, that we are talking about the same thing ? If I understood well what you've done, the bottom is sparse, but deep compression is suppose to have sparse weights, isn't it ? But even if I'm right this can be a nice inspiration for doing the speedup version. EDIT: For the sparse representation it's another thing, let me know if you disagree. |
Correct, good point. |
@beniz I saw your implementation of sparse matrix computing, that's perfect. But I have some questions. |
@yeahkun I haven't tested against BLAS libraries, I use OpenBLAS everywhere. Have you tested the speed of For sparse weights, as stated by @Caenorst you certainly need to modify the code because at the moment it is casting the bottom layer blobs into |
@beniz Actually I have trained a sparse model according to deep compression mentioned by @Caenorst . But in inference time, I use the sparse matrix-matrix computing function in mkl ( mkl_scsrmm() ) to replace the original dense computing function cblas_sgemm() and I found it's slower to use the mkl_scsrmm() (~0.19s, compared to cblas_sgemm(), ~0.14s). I also test the speed of caffe_cpu_csr_gemm() (~0.2s). I used the googlenet to do all above experiments and the compression rate is about 30% for the sparse model. |
Hi, I have error: Error parsing text-format caffe.NetParameter: 109:17: Message type "caffe.LayerParameter" has no field named "pruning_param". |
Hi, is this usable? |
Hi, it should still be usable, but please notice that it never have been merged (I actually forgot that I had to do the unit tests - -') so I guess you need to compile my own version.
|
@beniz hi, is there a tutorial for how to use sparse matrix computing method? |
@Caenorst hi, is there a detail document for how to use model pruning? |
@yeahkun Does "trained a sparse model" mean first pruning a pre-trained model and then doing fine-tuning to recover the accuracy? |
@Caenorst Thanks for your sharing! Here's some of my understanding about your work. Please correct me if anything mis-understood.
|
Hi @zyclonb, I used an online approach because there is actually no reason to do it offline (it add another manipulation for the user and also you would have to stock the mask in memory). You don't need to save the mask, you can directly put the weights at 0. Then when you want to reprune, the weights at 0 will be obviously pruned first. @xizi Also I haven't applied the sparse GEMM approach, but I believe it can be done separately. |
@@ -389,6 +389,7 @@ message LayerParameter { | |||
optional PoolingParameter pooling_param = 121; | |||
optional PowerParameter power_param = 122; | |||
optional PReLUParameter prelu_param = 131; | |||
optional PruningParameter pruning_param = 148; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be 147? Since in the comment above you say that the next available ID is 148.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it should be 147.
Hi, Is there maybe a parallel and similar pull request? I would be interested to test the compression proposed by Han et al. in "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", ICLR 2016. I suppose this pull request would handle the "pruning" part? |
This add the Deep compression's pruning feature on Inner product according to http://arxiv.org/abs/1506.02626, can also be used to a regularization looking at https://arxiv.org/abs/1602.07360
To use it just add in the prototxt the following: