You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have skimmed through the papers however didn't find the detailed explanation on accumulate gradients. Please help me understand. Generally simplified flow is like
predicted_output = model(input)
loss = loss_function(predicted_output, ground_truth)
optimizer.zero_grad()
loss.backward()
optimizer.step()
However in code, gradients are accumulated for 10 iterations and then reset. I am wondering what +ve or -ve impacts it will have if I
1: reset on each iteration means along the lines of above general algorithm flow
2: increase/decrease the self.iter_size
3: add support for multi-batching and multi-gpu
Many thanks.
The text was updated successfully, but these errors were encountered:
I have skimmed through the papers however didn't find the detailed explanation on accumulate gradients. Please help me understand. Generally simplified flow is like
predicted_output = model(input)
loss = loss_function(predicted_output, ground_truth)
optimizer.zero_grad()
loss.backward()
optimizer.step()
However in code, gradients are accumulated for 10 iterations and then reset. I am wondering what +ve or -ve impacts it will have if I
1: reset on each iteration means along the lines of above general algorithm flow
2: increase/decrease the self.iter_size
3: add support for multi-batching and multi-gpu
Many thanks.
The text was updated successfully, but these errors were encountered: