Training LLaVA with the Liger kernel results in degraded performance. #361

y-rok · 2024-12-10T09:05:24Z

🐛 Describe the bug

I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel (https://github.com/linkedin/Liger-Kernel). The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model trained with the Liger kernel showed lower performance on MLLM benchmarks, such as ChartQA. Since I used LLaMA 3, which is supported by Liger, I didn't expect any issues. Has anyone else tried training LLaVA with the Liger kernel?

Reproduce

from liger_kernel.transformers import apply_liger_kernel_to_llama
print("Apply liger_kernel_to_llama")
apply_liger_kernel_to_llama()

model = LlavaLlamaForCausalLM.from_pretrained(
                "meta-llama/Meta-Llama-3-8B",
                attn_implementation="flash_attention_2",
                torch_dtype=(torch.bfloat16),
            )

Versions

transformer = 4.45.1
torch = 2.4.0
a100

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training LLaVA with the Liger kernel results in degraded performance. #361

Training LLaVA with the Liger kernel results in degraded performance. #361

y-rok commented Dec 10, 2024

Training LLaVA with the Liger kernel results in degraded performance. #361

Training LLaVA with the Liger kernel results in degraded performance. #361

Comments

y-rok commented Dec 10, 2024

🐛 Describe the bug

Reproduce

Versions