Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 1.34 KB

File metadata and controls

44 lines (31 loc) · 1.34 KB

Overall

Compression

  • COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
  • Token Merging: Your VIT but Faster

Distillation

  • Less is More: Task-aware Layer-wise Distillation for Language Model Compression

System

  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
  • DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Finetune

  • Fine-Tuning Language Models with Just Forward Passes

Survey

LLM Family

GPT

Llama

Llama family. Captured from A Survey of Large Language Models