- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization -A Survey of Model Compression and Acceleration for Deep Neural Networks
- Dynamic Capacity Networks
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Xception: Deep Learning with Depthwise Separable Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Residual Attention Network for Image Classification
- SEP-Nets: Small and Effective Pattern Networks
- Deep Networks with Stochastic Depth
- Learning Infinite Layer Networks Without the Kernel Trick
- Coordinating Filters for Faster Deep Neural Networks
- ResBinNet: Residual Binary Neural Network
- Squeezedet: Unified, small, low power fully convolutional neural networks
- Efficient Sparse-Winograd Convolutional Neural Networks
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Dark knowledge
- FitNets: Hints for Thin Deep Nets
- Net2net: Accelerating learning via knowledge transfer
- Distilling the Knowledge in a Neural Network
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Sequence-Level Knowledge Distillation
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- Local Binary Convolutional Neural Networks
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Quantize weights and activations in Recurrent Neural Networks
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- Quantized Convolutional Neural Networks for Mobile Devices
- Compressing Deep Convolutional Networks using Vector Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Loss-aware Binarization of Deep Networks
- Towards the Limit of Network Quantization
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Trained Ternary Quantization
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Pruning Filters for Efficient ConvNets
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Soft Weight-Sharing for Neural Network Compression
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Learning both Weights and Connections for Efficient Neural Networks
- Dynamic Network Surgery for Efficient DNNs
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Speeding up convolutional neural networks with low rank expansions