Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

JasonZhu1313 · 2024-08-17T00:12:31Z

Feature request

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag

Motivation

Liger (Linkedin GPU Efficient Runtime) Kernel is a collection of Triton kernels designed specifically for LLM training. We have implemented Hugging Face Compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more to come. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. The kernel works out of the box with flash attention, PyTorch FSDP, and Microsoft DeepSpeed. We welcome contributions from the community to gather the best kernels for LLM training.

Your contribution

We (LinkedIn) will take care of work for a smooth integration and would need HF review and feedback for changes.

Benchmark

Benchmark conditions: LLaMA 3-8B, Alpaca Dataset, Max seq len = 512, Data Type = bf16, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 4 A100s.

The throughput increases by approximately 20% with more data, but the GPU memory is reduced by 40%. This means you can train the model on smaller GPUs, with larger batch sizes, or with longer sequence lengths at no additional cost.

For more detailed benchmark setup and more exciting efficiency for multi-head training (Medusa), please refer to original repo: https://github.com/linkedin/Liger-Kernel (Repo will be public soon)

amyeroberts · 2024-08-19T09:12:07Z

cc @ArthurZucker @muellerzr

ArthurZucker · 2024-08-19T12:46:43Z

Sounds great! Awesome work from your team 🥳

ByronHsu · 2024-08-24T16:40:22Z

linkedin/Liger-Kernel#70 Would love to have discussion on the better UX. cc @ArthurZucker @philschmid et al

llllvvuu · 2024-09-17T23:17:22Z

It looks like there is still an issue if using use_liger_kernel=True and torch_compile=True in Trainer with Llama: linkedin/Liger-Kernel#174

JasonZhu1313 added the Feature request Request for a new feature label Aug 17, 2024

JasonZhu1313 changed the title ~~Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer~~ Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Aug 17, 2024

JasonZhu1313 mentioned this issue Aug 17, 2024

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer #32860

Merged

9 tasks

ArthurZucker added the trainer label Aug 19, 2024

shimizust mentioned this issue Sep 17, 2024

Updated Trainer's liger-kernel integration to call correct patching API #33502

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

JasonZhu1313 commented Aug 17, 2024 •

edited

Loading

amyeroberts commented Aug 19, 2024

ArthurZucker commented Aug 19, 2024

ByronHsu commented Aug 24, 2024

llllvvuu commented Sep 17, 2024

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace #32861

Comments

JasonZhu1313 commented Aug 17, 2024 • edited Loading

Feature request

Motivation

Your contribution

Benchmark

amyeroberts commented Aug 19, 2024

ArthurZucker commented Aug 19, 2024

ByronHsu commented Aug 24, 2024

llllvvuu commented Sep 17, 2024

JasonZhu1313 commented Aug 17, 2024 •

edited

Loading