-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant time spent moving medium-size arrays to GPU, type instability #2414
Comments
Do you have a MWE which only captures the CPU <-> GPU data movement? 95% of the code here is unrelated, so it'd be tricky to determine what the culprit might be. |
I trimmed a bit but I'm not sure if I can minimize it much further? This is already quite reduced from the real network. If I replace the body of |
In that case, it's possible that the allocation or copying required for |
Hmm okay, thanks. Putting I must not be understanding something about GPUs here. The total size of the training data generated in each iteration in this example is 2 MB. The weights and biases of the network are 3 MB. The optimizer state is 6 MB. How am I saturating 16 GB of VRAM and 16 GB of shared memory? |
|
Watching Task Manager's record of GPU memory, Using
EDIT: And when I check what |
If I run the GC in every other iteration instead of 0.001 s or 0.05 s for the line (every other iteration - longer on iterations where the GC runs) and 9 s total, though 41% of that is GC. If I run the GC every iteration, 0.002 s for the line consistently, but 11 s total and 67% GC time. EDIT: Ah, sweet spot... |
Yes, that sounds about right. I neglected to mention |
Thanks for your help! I'm new to GPU work so I was primarily relying on Flux docs. I'll see if there's something to note there. Meanwhile, I saw there's initial work being done over on CUDA.jl to run the GC heuristically, which would be nice. |
closing as addressed by #2416, feel free to reopen if needed though |
There are occasions where
@profview
shows seemingly inordinate amount of time spent moving data to the GPU given the array size, and possibly excessive GPU memory usage? Not sure what I should be expecting. Could be related to having two array outputs rather than one? I've also see type instability reported through@code_warntype
and Cthulhu that I'm not sure how to resolve.Using a toy example to show the effect:
The text was updated successfully, but these errors were encountered: