-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
求助:cudnn错误 #59
Comments
从报错信息上看,应该是你的CUDA环境和cudnn的环境没有配置好 |
谢谢回复,我设置了config.gpu_options.allow_growth = True,batchsize改回16没有问题了,但又提示OSError: Unable to create file (unable to open file: name = 'model_speech/m251/speech_model251_e_0_step_500.model', errno = 2, error message = 'No such file or directory', flags = 13, o_flags = 242)的错误 |
训练m251模型的时候,需要在model_speech/目录下使用mkdir创建yige名为m251的目录,然后就好了 |
@nl8590687 thanks, it works |
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize |
Hi, 我clone最新的代码,按照guide训练st-cmds和thchs30数据,出现了一下错误,这是版本不对还是其他什么问题?谢谢
TensorFlow:1.12.0
cuda:9.0
cudnn:7.3.1
gpu:Tesla V100-PCIE-16GB
[*Info] Create Model Successful, Compiles Model Successful.
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-11-21 15:18:48.028180: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-11-21 15:18:48.058536: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "train_mspeech.py", line 47, in
ms.TrainModel(datapath, epoch = 50, batch_size = 64, save_step = 500)
File "/data/wujiaxing/workspace/ASR/ASRT_SpeechRecognition/SpeechModel251.py", line 179, in TrainModel
self._model.fit_generator(yielddatas, save_step)
File "/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1439, in call
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/conv2d_1/convolution_grad/Conv2DBackpropFilter"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/conv2d_1/convolution_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d_1/kernel/read)]]
[[{{node ctc/scan/while/Fill/_267}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_419_ctc/scan/while/Fill", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
The text was updated successfully, but these errors were encountered: