Directly use low bit gguf for continuous training? #10199

FNsi · 2024-11-07T06:04:10Z

FNsi
Nov 7, 2024

Any clues?

(Finetuning/Training gguf models)
#2632
l tried Q8_0 training on Openllama 3B.

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8
Perhaps you are looking for quantization aware training like this one?

BarfingLemurs · 2024-11-07T10:52:03Z

(Finetuning/Training gguf models)
#2632
l tried Q8_0 training on Openllama 3B.

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8
Perhaps you are looking for quantization aware training like this one?

1 reply

Here's a quantized gguf conversion script: https://github.com/PygmalionAI/aphrodite-engine/blob/main/examples/gguf_to_torch.py

Sorry that's not what I am looking for 😅

Perhaps you are looking for quantization aware training like this one?

It's close, but what I mean is like training in fp8 but in llama.cpp's Quantisation. 🥲