Identical GPU memory usage and Fine-Tuning time for 2-bit and 4-bit quantized Llama-2-7b-hf model #36

zhuzhuxia1221 · 2024-08-17T07:38:29Z

I downloaded the Llama-2-7b-hf-2bit-32rank and Llama-2-7b-hf-4bit-32rank models from Hugging Face and ran fine-tuning using train_clm.py. However, both models consumed the same amount of GPU memory and took the same time to fine-tune. Can you tell me why this is happening?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identical GPU memory usage and Fine-Tuning time for 2-bit and 4-bit quantized Llama-2-7b-hf model #36

Identical GPU memory usage and Fine-Tuning time for 2-bit and 4-bit quantized Llama-2-7b-hf model #36

zhuzhuxia1221 commented Aug 17, 2024

Identical GPU memory usage and Fine-Tuning time for 2-bit and 4-bit quantized Llama-2-7b-hf model #36

Identical GPU memory usage and Fine-Tuning time for 2-bit and 4-bit quantized Llama-2-7b-hf model #36

Comments

zhuzhuxia1221 commented Aug 17, 2024