-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPU CUDA out of memory error when workers_per_replica > 1 #853
Fix GPU CUDA out of memory error when workers_per_replica > 1 #853
Conversation
@RobertLucian thank you for looking into this! I updated the code a little, and moved the documentation you added to Also, I am still seeing GPU out of memory issues when I tried running it. Now the API did become ready without crashing, but when I tried to hit the API with concurrent requests, it seemed to crash due to GPU OOM. I changed from Perhaps the model, once loaded into the GPU, is too big? Or perhaps limiting the GPU growth isn't working for some reason? Or perhaps there is a GPU memory leak somehow? Here is the error I saw:
|
@deliahu Apparently, the CRNN API models need cumulatively about 8119 MiB. The T4 GPU only has 15079 MiB. This means it's not possible to fit 2 workers on a single T4 GPU. And there's no GPU memory leak nor there is an issue with the GPU growth setting - we're okay on that regard. I looked into ways of reducing the memory need of the models within Keras to be able to fit in 2 workers, and without a significant change to the used models (for instance, inside faustomorales/keras-ocr's source code), there isn't an easy way out of this.
I looked into this and I found out the memory requirements of loading a model are lower than those of loading a model and running predictions. This explains the above situation.
Yes, I like this. Specifically the headline. It's succinct. |
All sounds good, thank you for looking into this! |
(cherry picked from commit c0f3d4b)
Fixes the original problem of #845.
When the following conditions are met:
workers_per_replica
is set to a value > 1.CUDA_ERROR_OUT_OF_MEMORY
error is thrown for allworkers_per_replica - 1
that didn't have a chance of "reserving" the GPU's memory. By default, when loading up a model, all of the GPU's memory is pre-allocated. To avoid that, the GPU's memory usage has to be limited - either by:checklist:
make test
andmake lint
summary.md
(view in gitbook after merging)