There exists a 2.4B training parameter during fine-tuned training of a 70B model, where did this parameter come from? #12213

echo-valor · 2025-02-17T08:14:56Z

1000 thanks.

akoumpa · 2025-02-18T21:59:33Z

I'm not 100% sure what your setup is, but I'm speculating it probably shows you the number of parameters on each residing GPU (assuming model sharding is used).

If you think this is a bug, please share instructions to reproduce your setup.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There exists a 2.4B training parameter during fine-tuned training of a 70B model, where did this parameter come from? #12213

There exists a 2.4B training parameter during fine-tuned training of a 70B model, where did this parameter come from? #12213

echo-valor commented Feb 17, 2025

akoumpa commented Feb 18, 2025

There exists a 2.4B training parameter during fine-tuned training of a 70B model, where did this parameter come from? #12213

There exists a 2.4B training parameter during fine-tuned training of a 70B model, where did this parameter come from? #12213

Comments

echo-valor commented Feb 17, 2025

akoumpa commented Feb 18, 2025