Skip to content

Latest commit

 

History

History
20 lines (18 loc) · 1.08 KB

File metadata and controls

20 lines (18 loc) · 1.08 KB

Quantization

  • Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
  • XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient
  • BiT: Robustly Binarized Multi-distilled Transformer
  • LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
  • Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
  • OPTQ: Accurate Quantization for Generative Pre-trained Transformers
  • QLoRA: Efficient Finetuning of Quantized LLMs
  • DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs
  • Exploiting LLM Quantization
  • BiDM: Pushing the Limit of Quantization for Diffusion Models
  • QBB: Quantization with Binary Bases for LLMs
  • StepbaQ: Stepping backward as Correction for Quantized Diffusion Models
  • PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

Post-Training Quantization (PTQ)

  • Q-VLM: Post-training Quantization for Large Vision-Language Models

Mix-Precision

  • Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models