- Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
- XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient
- BiT: Robustly Binarized Multi-distilled Transformer
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
- Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
- OPTQ: Accurate Quantization for Generative Pre-trained Transformers
- QLoRA: Efficient Finetuning of Quantized LLMs
- DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs
- Exploiting LLM Quantization
- BiDM: Pushing the Limit of Quantization for Diffusion Models
- QBB: Quantization with Binary Bases for LLMs
- StepbaQ: Stepping backward as Correction for Quantized Diffusion Models
- PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
- Q-VLM: Post-training Quantization for Large Vision-Language Models
- Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models