Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems in train_deepspeed.py with ZeRO stage 1|2|3 #34

Open
zjhJOJO opened this issue Sep 7, 2023 · 0 comments
Open

Problems in train_deepspeed.py with ZeRO stage 1|2|3 #34

zjhJOJO opened this issue Sep 7, 2023 · 0 comments

Comments

@zjhJOJO
Copy link

zjhJOJO commented Sep 7, 2023

When running 'train_deepspeed.py' using your provided command, I noticed that the trainable parameter number is consistently zero. So, I performed a thorough code analysis and pinpointed the root cause of the error to be in Line 18 of 'insert_lora.py' that the program consistently always enters the exception block, which has been quite frustrating for me. I would greatly appreciate it if someone could assist me with this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant