Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

Open
JasonZhu1313 opened this issue Aug 17, 2024 · 4 comments
Open
Labels
Feature request Request for a new feature trainer

Comments

@JasonZhu1313
Copy link
Contributor

JasonZhu1313 commented Aug 17, 2024

Feature request

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag

Motivation

Liger (Linkedin GPU Efficient Runtime) Kernel is a collection of Triton kernels designed specifically for LLM training. We have implemented Hugging Face Compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more to come. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. The kernel works out of the box with flash attention, PyTorch FSDP, and Microsoft DeepSpeed. We welcome contributions from the community to gather the best kernels for LLM training.

Your contribution

We (LinkedIn) will take care of work for a smooth integration and would need HF review and feedback for changes.

Benchmark

Benchmark conditions: LLaMA 3-8B, Alpaca Dataset, Max seq len = 512, Data Type = bf16, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 4 A100s.

The throughput increases by approximately 20% with more data, but the GPU memory is reduced by 40%. This means you can train the model on smaller GPUs, with larger batch sizes, or with longer sequence lengths at no additional cost.

image (3)
image (4)

For more detailed benchmark setup and more exciting efficiency for multi-head training (Medusa), please refer to original repo: https://github.com/linkedin/Liger-Kernel (Repo will be public soon)

@JasonZhu1313 JasonZhu1313 added the Feature request Request for a new feature label Aug 17, 2024
@JasonZhu1313 JasonZhu1313 changed the title Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Aug 17, 2024
@amyeroberts
Copy link
Collaborator

cc @ArthurZucker @muellerzr

@ArthurZucker
Copy link
Collaborator

Sounds great! Awesome work from your team 🥳

@ByronHsu
Copy link
Contributor

linkedin/Liger-Kernel#70 Would love to have discussion on the better UX. cc @ArthurZucker @philschmid et al

@llllvvuu
Copy link

It looks like there is still an issue if using use_liger_kernel=True and torch_compile=True in Trainer with Llama: linkedin/Liger-Kernel#174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature trainer
Projects
None yet
Development

No branches or pull requests

5 participants