Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid grad sync on each step even when doing accumulation #1064

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

KohakuBlueleaf
Copy link
Contributor

Refer to the official document: https://huggingface.co/docs/accelerate/concept_guides/gradient_synchronization

We only need to sync our gradient when we need to update the weight. But the fix by @Isotr0py force gradient sync on each batch even when we don't need to update the weights.

Want to confirm with @Isotr0py if my modification is safe. I tried on my 2gpu machine and it works.

@Isotr0py
Copy link
Contributor

Isotr0py commented Jan 23, 2024

Thanks for correction! I forgot to test with gradient accumulation before. I think your modification is safe.

I tested with gradient_accumulation_steps=2 and the outputs seems to be OK:

Rank 1, weight: 0.00012163435894763097, grad: -6.936413110558703e-10, sync:False, step=2
Rank 0, weight: 0.00012163435894763097, grad: 1.6429112292826176e-08, sync:False, step=2

Rank 1, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3
Rank 0, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3

Rank 1, weight: 0.0001216398595715873, grad: -9.085246333029318e-09, sync:False, step=4
Rank 0, weight: 0.0001216398595715873, grad: -2.3792381398379803e-08, sync:False, step=4

Rank 0, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5
Rank 1, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5

Rank 0, weight: 0.00012164600775577128, grad: -1.2868744292404699e-08, sync:False, step=6
Rank 1, weight: 0.00012164600775577128, grad: -7.038276628179574e-09, sync:False, step=6

Rank 1, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7
Rank 0, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7

Rank 0, weight: 0.00012166520900791511, grad: -6.624031811952591e-08, sync:False, step=8
Rank 1, weight: 0.00012166520900791511, grad: -5.537489045082111e-08, sync:False, step=8

Rank 0, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9
Rank 1, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9

Rank 1, weight: 0.00012176889867987484, grad: -3.346940502524376e-07, sync:False, step=9
Rank 0, weight: 0.00012176889867987484, grad: -1.471926225349307e-07, sync:False, step=9

Rank 1, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10
Rank 0, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10

@KohakuBlueleaf
Copy link
Contributor Author

@Isotr0py Thx your test!!

@kohya-ss
Copy link
Owner

Thank you for this PR. It looks good!

@kohya-ss kohya-ss merged commit 7a20df5 into kohya-ss:dev Jan 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants