Overall

Quantization

Compression

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Token Merging: Your VIT but Faster

Distillation

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

System

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Finetune

Fine-Tuning Language Models with Just Forward Passes

Survey

A Survey of Large Language Models

LLM Family

GPT

Llama

Alpaca
Koala
Baize
Vicuna
- RedPajama
- Falcon
Llama-2

GLM

RWKV

WizardLM
MPT