Fix GPU CUDA out of memory error when workers_per_replica > 1 (#853)

RobertLucian · web-flow · commit c0f3d4b70c84 · 2020-03-13T08:38:09.000-07:00
diff --git a/docs/deployments/gpus.md b/docs/deployments/gpus.md
@@ -8,3 +8,26 @@ To use GPUs:
 2. You may need to [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to increase the limit for your desired instance type.
 3. Set instance type to an AWS GPU instance (e.g. g4dn.xlarge) when installing Cortex.
 4. Set the `gpu` field in the `compute` configuration for your API. One unit of GPU corresponds to one virtual GPU. Fractional requests are not allowed.
+
+## Pitfalls
+
+### If using `workers_per_replica` > 1, TensorFlow-based models, and Python Predictor
+
+When using `workers_per_replica` > 1 with TensorFlow-based models (including Keras) in the Python Predictor, loading the model in separate processes at the same time will throw a `CUDA_ERROR_OUT_OF_MEMORY: out of memory` error. This is because the first process that loads the model will allocate all of the GPU's memory and leave none to other processes. To prevent this from happening, the per-process GPU memory usage can be limited. There are two methods:
+
+1\) Configure the model to allocate only as much memory as it requires, via [tf.config.experimental.set_memory_growth()](https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth):
+
+```python
+for gpu in tf.config.list_physical_devices("GPU"):
+    tf.config.experimental.set_memory_growth(gpu, True)
+```
+
+2\) Impose a hard limit on how much memory the model can use, via [tf.config.set_logical_device_configuration()](https://www.tensorflow.org/api_docs/python/tf/config/set_logical_device_configuration):
+
+```python
+mem_limit_mb = 1024
+for gpu in tf.config.list_physical_devices("GPU"):
+    tf.config.set_logical_device_configuration(gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=mem_limit_mb)])
+```
+
+See the [TensorFlow GPU guide](https://www.tensorflow.org/guide/gpu) and this [blog post](https://medium.com/@starriet87/tensorflow-2-0-wanna-limit-gpu-memory-10ad474e2528) for additional information.
diff --git a/examples/tensorflow/license-plate-reader/predictor_crnn.py b/examples/tensorflow/license-plate-reader/predictor_crnn.py
@@ -5,10 +5,15 @@
 import keras_ocr
 import base64
 import pickle
+import tensorflow as tf
 
 
 class PythonPredictor:
     def __init__(self, config):
+        # limit memory usage on each process
+        for gpu in tf.config.list_physical_devices("GPU"):
+            tf.config.experimental.set_memory_growth(gpu, True)
+
         # keras-ocr will automatically download pretrained
         # weights for the detector and recognizer.
         self.pipeline = keras_ocr.pipeline.Pipeline()