You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cannot get simrdwn to train. It is telling me (via tensorflow) that my CUDA driver version is insufficient for CUDA runtime version. I know this seems like it is not a problem with this repository specifically but it seems like everything is configured properly on my end so I am at a loss to explain this behaviour.
I tried this using the default repository configuration, but I was receiving this very same error. I only have CUDA 9.1 becase I changed the first line of the Dockerfile from
nvidia/cuda:9.0-devel-ubuntu16.04
to
nvidia/cuda:9.1-devel-ubuntu16.04
This is the error I get:
Traceback (most recent call last):
File "/tensorflow/models/research/object_detection/model_main.py", line 109, in <module>
tf.app.run()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/tensorflow/models/research/object_detection/model_main.py", line 105, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run
return self.run_local()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local
saving_listeners=saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1403, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 508, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 934, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1122, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1127, in _create_session
return self._sess_creator.create_session()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 805, in create_session
self.tf_sess = self._session_creator.create_session()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 571, in create_session
init_fn=self._scaffold.init_fn)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 281, in prepare_session
config=config)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 184, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
This is the output of nvcc --version (run from inside the container):
Cuda compilation tools, release 9.1, V9.1.85 (Again, I know that the Dockerfile specified v9.0, but I was getting the same error and that was why I tried bumping it up)
This is the output of nvidia-smi (run from outside the container):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 845M Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P0 N/A / N/A | 167MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1748 G /usr/lib/xorg/Xorg 166MiB |
+-----------------------------------------------------------------------------+
And according to the release notes, these should be compatible:
CUDA Toolkit | Linux x86_64 Driver Version
CUDA 9.1 (9.1.85) | >= 390.46
So since I have driver version 390.116 and CUDA Toolkit version 9.1, I can't explain why the container keeps throwing me this error
Do you have any idea?
The text was updated successfully, but these errors were encountered:
I cannot get simrdwn to train. It is telling me (via tensorflow) that my CUDA driver version is insufficient for CUDA runtime version. I know this seems like it is not a problem with this repository specifically but it seems like everything is configured properly on my end so I am at a loss to explain this behaviour.
I tried this using the default repository configuration, but I was receiving this very same error. I only have CUDA 9.1 becase I changed the first line of the Dockerfile from
nvidia/cuda:9.0-devel-ubuntu16.04
to
nvidia/cuda:9.1-devel-ubuntu16.04
This is the error I get:
This is the output of
nvcc --version
(run from inside the container):Cuda compilation tools, release 9.1, V9.1.85
(Again, I know that the Dockerfile specified v9.0, but I was getting the same error and that was why I tried bumping it up)This is the output of
nvidia-smi
(run from outside the container):And according to the release notes, these should be compatible:
So since I have driver version 390.116 and CUDA Toolkit version 9.1, I can't explain why the container keeps throwing me this error
Do you have any idea?
The text was updated successfully, but these errors were encountered: