-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchNorm and InstanceNormalization tests fail with CUDA 10.1/CUDNN 7.5 #528
Comments
Hi Scott, I tested it on Linux, all tests passed. $ ctest -C Debug 100% tests passed, 0 tests failed out of 5 Cuda version: 10.1.105-1 |
My fault - environment error. Whilst it was building and linking against CUDNN 7.5 correctly and as expected, I had the location of CUDNN 7.1 in the PATH environment variable (forgot to update it when I added 7.5) so it was actually loading that version of cudnn64_7.dll. I guess the vast majority of APIs haven't changed so this worked most of the time, but not quite all (most likely CUDNN_BN_MIN_EPSILON differs between the versions). |
The following unit tests fail with CUDA 10.1/CUDNN 7.5
BatchNormTest.PositiveTestCase
BatchNormTest.PositiveTestCaseDefaultEpsilon
BatchNormTest.BatchNorm1d_3d_Pytorch
BatchNormTest.BatchNorm2d_Pytorch
BatchNormTest.BatchNorm3d_Pytorch
BatchNormTest.BatchNorm2d_fp16
InstanceNormalizationOpTest.InstanceNorm
InstanceNormalizationOpTest.InstanceNorm_2
Describe steps/code to reproduce the behavior: Build with CUDA enabled, and run unit tests.
Error messages:
BatchNorm:
2019-02-28 10:34:01.0270819 [E:onnxruntime:Default, cuda_call.cc:93 onnxruntime::CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=MININT-6NOLPDK ; expr=cudnnBatchNormalizationForwardInference( CudnnHandle(), cudnn_batch_norm_mode_, &alpha, &beta, data_desc, x_data, data_desc, y_data, bn_tensor_desc, f_scale.get(), f_B.get(), f_mean.get(), f_var.get(), epsilon_);
InstanceNormalization:
2019-02-28 10:34:02.0367632 [E:onnxruntime:Default, cuda_call.cc:93 onnxruntime::CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=MININT-6NOLPDK ; expr=cudnnBatchNormalizationForwardTraining( CudnnHandle(), CUDNN_BATCHNORM_SPATIAL, &one, &zero, data_desc, x_data, data_desc, y_data, stats_desc, unused_scale.get(), unused_bias.get(), 1.0f, mean.get(), variance.get(), 0.0, nullptr, nullptr);
2019-02-28 10:34:02.0370557 [E:onnxruntime:Default, provider_test_utils.cc:315
The text was updated successfully, but these errors were encountered: