- COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
- Token Merging: Your VIT but Faster
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
- Fine-Tuning Language Models with Just Forward Passes