-
Notifications
You must be signed in to change notification settings - Fork 6.8k
split_and_load can now handle num_ctx > num_data. Github Issue #13909 #14607
Conversation
Just an FYI, I made a misleading comment on the code previously, so I fixed it. |
I am not sure how the weights update for the context without inpus when backward. |
@wkcn For example, suppose we have 3 examples left from the dataset and have 5 GPU contexts.
only gradients for GPU 0, 1, 2 will be calculated and when we eventually call (+ Also, I noticed that I have an indentation error in my original PR request description so I just edited it.) |
@mightydeveloper However, it seems that The code of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM : )
Thanks for your contribution!
Thanks for checking and providing relevant code link! So... If I want to zero out gradients, I guess I should either
Would all three of them work? which one would be better? |
calling trainer.step(ignore_stale_grad=True) is okay : ) |
@mightydeveloper Thanks for your contributions. @mxnet-label-bot Add [Gluon] |
Merged. Thanks for your contribution! |
Description
Handles Issue #13909
when last batch size is smaller than the number of contexts,
the previous utility function
gluon.utils.split_and_load
throws an exceptionValueError: Too many slices for data with shape ....
However, we can just put one data per context and ignore the remaining contexts.
This integrates nicely with the given reproducing example code. (User does not need to modify the code)
where
data
andlabel
will only output data that exists in the first few of the given context.the forward / backward pass is only done in the contexts where needed. (remaining contexts does not need to do a fake forward/backward pass)
My concern is people can make mistakes when calculating mean of the losses.
So it would be nice if we add this behavior to the documentation. (
split_and_load
can output a list that has a size less than number of contexts when even_split=False)Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
-> I believe this PR is just tiny change