Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There exists a 2.4B training parameter during fine-tuned training of a 70B model, where did this parameter come from? #12213

Open
echo-valor opened this issue Feb 17, 2025 · 1 comment

Comments

@echo-valor
Copy link

Image

1000 thanks.

@akoumpa
Copy link
Member

akoumpa commented Feb 18, 2025

Hi @echo-valor

I'm not 100% sure what your setup is, but I'm speculating it probably shows you the number of parameters on each residing GPU (assuming model sharding is used).

If you think this is a bug, please share instructions to reproduce your setup.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants