Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle.fluid.core.EnforceNotMet: CUDNN_STATUS_NOT_SUPPORTED at [/paddle/paddle/fluid/operators/batch_norm_op.cu.cc:143] #14303

Closed
Angus07 opened this issue Nov 7, 2018 · 7 comments
Assignees

Comments

@Angus07
Copy link

Angus07 commented Nov 7, 2018

https://github.com/PaddlePaddle/Paddle/issues/929描述的问题非常相似
训练没问题,预测的时候报错
paddle.fluid.core.EnforceNotMet: CUDNN_STATUS_NOT_SUPPORTED at [/paddle/paddle/fluid/operators/batch_norm_op.cu.cc:143]

之前的网络里面也有batch norm,跑得很正常。
现在调整了一下网络,增加了一个Batchnorm。

inception1 = fluid.layers.sequence_conv(
input=emb1,
num_filters=hid_dim,
filter_size=1)
inception2 = fluid.layers.batch_norm(input=inception1, act='relu')
inception3 = fluid.layers.sequence_conv(
input=inception2,
num_filters=hid_dim,
filter_size=3)
inception_title = fluid.layers.sequence_pool(input=inception3, pool_type='max')
就报错了。

@NHZlX NHZlX added the 训练 label Nov 7, 2018
@Angus07
Copy link
Author

Angus07 commented Nov 9, 2018

升级到了paddle 1.1 cudnn7 (之前用的cudnn6)。报错变成了paddle.fluid.core.EnforceNotMet: CUDNN_STATUS_NOT_INITIALIZED at [/paddle/paddle/fluid/platform/device_context.cc:162]。其他网络都可以正常运行,就是加了这个结构跑不了。

@NHZlX
Copy link
Contributor

NHZlX commented Nov 12, 2018

c668e69936d36a0aaac8b4b441d7fb23
目前定位到的问题是,当模型进行测试时,并且运行到 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/batch_norm_op.cu.cc#L143 就会出现上图的问题。麻烦 @qingqing01 @jacquesqiao 看一下

@NHZlX NHZlX assigned jacquesqiao and qingqing01 and unassigned jacquesqiao Nov 12, 2018
@jacquesqiao
Copy link
Member

以前v5遇到过test在batch维度超过1024时cudnn有bug,清先减少batch_size试试

@Angus07
Copy link
Author

Angus07 commented Nov 12, 2018

我batch_size很小,远小于1024,是做了卷积之后导致tensor的第一维变得非常大。
我尝试减少了batch_size,还是一样报错。
#929 这个描述里面说用了cudnn6.0以上就没有这个问题。为什么我用的是cudnn7,还是出这个错?

@Angus07
Copy link
Author

Angus07 commented Nov 12, 2018

Batch size设成1都报错。

@NHZlX
Copy link
Contributor

NHZlX commented Nov 12, 2018

batch 设置小一点没有问题,因为是序列数据,所以数据的第一维度也会超过1024,建议先使用小维度的数据进行测试。 麻烦 @qingqing01 看一下这个问题,根据序列的数据第一维度的情况来考虑是否使用cudnn跑test。

@lucywsq
Copy link

lucywsq commented Dec 20, 2018

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持

@lucywsq lucywsq closed this as completed Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants