Avoid grad sync on each step even when doing accumulation #1064

KohakuBlueleaf · 2024-01-23T03:53:15Z

Refer to the official document: https://huggingface.co/docs/accelerate/concept_guides/gradient_synchronization

We only need to sync our gradient when we need to update the weight. But the fix by @Isotr0py force gradient sync on each batch even when we don't need to update the weights.

Want to confirm with @Isotr0py if my modification is safe. I tried on my 2gpu machine and it works.

Isotr0py · 2024-01-23T05:32:41Z

Thanks for correction! I forgot to test with gradient accumulation before. I think your modification is safe.

I tested with gradient_accumulation_steps=2 and the outputs seems to be OK:

Rank 1, weight: 0.00012163435894763097, grad: -6.936413110558703e-10, sync:False, step=2
Rank 0, weight: 0.00012163435894763097, grad: 1.6429112292826176e-08, sync:False, step=2

Rank 1, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3
Rank 0, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3

Rank 1, weight: 0.0001216398595715873, grad: -9.085246333029318e-09, sync:False, step=4
Rank 0, weight: 0.0001216398595715873, grad: -2.3792381398379803e-08, sync:False, step=4

Rank 0, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5
Rank 1, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5

Rank 0, weight: 0.00012164600775577128, grad: -1.2868744292404699e-08, sync:False, step=6
Rank 1, weight: 0.00012164600775577128, grad: -7.038276628179574e-09, sync:False, step=6

Rank 1, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7
Rank 0, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7

Rank 0, weight: 0.00012166520900791511, grad: -6.624031811952591e-08, sync:False, step=8
Rank 1, weight: 0.00012166520900791511, grad: -5.537489045082111e-08, sync:False, step=8

Rank 0, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9
Rank 1, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9

Rank 1, weight: 0.00012176889867987484, grad: -3.346940502524376e-07, sync:False, step=9
Rank 0, weight: 0.00012176889867987484, grad: -1.471926225349307e-07, sync:False, step=9

Rank 1, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10
Rank 0, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10

KohakuBlueleaf · 2024-01-23T05:51:30Z

@Isotr0py Thx your test!!

kohya-ss · 2024-01-23T11:33:43Z

Thank you for this PR. It looks good!

Avoid always sync

711b40c

kohya-ss merged commit 7a20df5 into kohya-ss:dev Jan 23, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid grad sync on each step even when doing accumulation #1064

Avoid grad sync on each step even when doing accumulation #1064

KohakuBlueleaf commented Jan 23, 2024

Isotr0py commented Jan 23, 2024 •

edited

Loading

KohakuBlueleaf commented Jan 23, 2024

kohya-ss commented Jan 23, 2024

Avoid grad sync on each step even when doing accumulation #1064

Avoid grad sync on each step even when doing accumulation #1064

Conversation

KohakuBlueleaf commented Jan 23, 2024

Isotr0py commented Jan 23, 2024 • edited Loading

KohakuBlueleaf commented Jan 23, 2024

kohya-ss commented Jan 23, 2024

Isotr0py commented Jan 23, 2024 •

edited

Loading