Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA driver version is insufficient for CUDA runtime version #70

Open
iboates opened this issue Jul 28, 2019 · 3 comments
Open

CUDA driver version is insufficient for CUDA runtime version #70

iboates opened this issue Jul 28, 2019 · 3 comments

Comments

@iboates
Copy link

iboates commented Jul 28, 2019

I cannot get simrdwn to train. It is telling me (via tensorflow) that my CUDA driver version is insufficient for CUDA runtime version. I know this seems like it is not a problem with this repository specifically but it seems like everything is configured properly on my end so I am at a loss to explain this behaviour.

I tried this using the default repository configuration, but I was receiving this very same error. I only have CUDA 9.1 becase I changed the first line of the Dockerfile from

nvidia/cuda:9.0-devel-ubuntu16.04

to

nvidia/cuda:9.1-devel-ubuntu16.04

This is the error I get:

Traceback (most recent call last):
  File "/tensorflow/models/research/object_detection/model_main.py", line 109, in <module>
    tf.app.run()
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/tensorflow/models/research/object_detection/model_main.py", line 105, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run
    return self.run_local()
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local
    saving_listeners=saving_listeners)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
    saving_listeners)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1403, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 508, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 934, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1122, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1127, in _create_session
    return self._sess_creator.create_session()
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 805, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 571, in create_session
    init_fn=self._scaffold.init_fn)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 281, in prepare_session
    config=config)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 184, in _restore_checkpoint
    sess = session.Session(self._target, graph=self._graph, config=config)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

This is the output of nvcc --version (run from inside the container):

Cuda compilation tools, release 9.1, V9.1.85 (Again, I know that the Dockerfile specified v9.0, but I was getting the same error and that was why I tried bumping it up)

This is the output of nvidia-smi (run from outside the container):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 845M        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P0    N/A /  N/A |    167MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1748      G   /usr/lib/xorg/Xorg                           166MiB |
+-----------------------------------------------------------------------------+

And according to the release notes, these should be compatible:

CUDA Toolkit      | Linux x86_64 Driver Version
CUDA 9.1 (9.1.85) | >= 390.46

So since I have driver version 390.116 and CUDA Toolkit version 9.1, I can't explain why the container keeps throwing me this error

Do you have any idea?

@yooyo-Q
Copy link

yooyo-Q commented Jul 30, 2019

You need to upgrade your graphics driver,my Driver Version: 418.39

@iboates
Copy link
Author

iboates commented Jul 30, 2019

What is your graphics card model?

@yooyo-Q
Copy link

yooyo-Q commented Jul 31, 2019

What is your graphics card model?

p100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants