"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

alvations · 2016-10-04T10:37:21Z

I have installed paddlepaddle using the .deb file from https://github.com/baidu/Paddle/releases/download/V0.8.0b1/paddle-gpu-0.8.0b1-Linux.deb

I have a GTX 1080 with CUDA 8.0 installed with cudnn v5.1 without the NVIDIA Accelerated Graphics Driver

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

I've set the shell variables:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

And when I tried to run the demo from the Paddle github repo, I am getting a [hl_gpu_apply_unary_op failed] CUDA error: invalid device function error. Is there some way to resolve this?
#3, #18, #95 seems to occur even though the cmake file on the latest release should have been fixed #107 , i'm getting this error:

~/Paddle/demo/image_classification$ bash train.sh 
I1005 13:07:51.456790   894 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model 
I1005 13:07:55.145606   894 Util.cpp:126] Calling runInitFunctions
I1005 13:07:55.145925   894 Util.cpp:139] Call runInitFunctions done.
[INFO 2016-10-05 13:07:55,313 layers.py:1620] channels=3 size=3072
[INFO 2016-10-05 13:07:55,313 layers.py:1620] output size for __conv_0__ is 32 
[INFO 2016-10-05 13:07:55,315 layers.py:1620] channels=64 size=65536
[INFO 2016-10-05 13:07:55,315 layers.py:1620] output size for __conv_1__ is 32 
[INFO 2016-10-05 13:07:55,316 layers.py:1681] output size for __pool_0__ is 16*16 
[INFO 2016-10-05 13:07:55,317 layers.py:1620] channels=64 size=16384
[INFO 2016-10-05 13:07:55,317 layers.py:1620] output size for __conv_2__ is 16 
[INFO 2016-10-05 13:07:55,319 layers.py:1620] channels=128 size=32768
[INFO 2016-10-05 13:07:55,319 layers.py:1620] output size for __conv_3__ is 16 
[INFO 2016-10-05 13:07:55,320 layers.py:1681] output size for __pool_1__ is 8*8 
[INFO 2016-10-05 13:07:55,321 layers.py:1620] channels=128 size=8192
[INFO 2016-10-05 13:07:55,321 layers.py:1620] output size for __conv_4__ is 8 
[INFO 2016-10-05 13:07:55,323 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 13:07:55,323 layers.py:1620] output size for __conv_5__ is 8 
[INFO 2016-10-05 13:07:55,324 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 13:07:55,325 layers.py:1620] output size for __conv_6__ is 8 
[INFO 2016-10-05 13:07:55,326 layers.py:1681] output size for __pool_2__ is 4*4 
[INFO 2016-10-05 13:07:55,327 layers.py:1620] channels=256 size=4096
[INFO 2016-10-05 13:07:55,327 layers.py:1620] output size for __conv_7__ is 4 
[INFO 2016-10-05 13:07:55,328 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 13:07:55,329 layers.py:1620] output size for __conv_8__ is 4 
[INFO 2016-10-05 13:07:55,330 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 13:07:55,330 layers.py:1620] output size for __conv_9__ is 4 
[INFO 2016-10-05 13:07:55,332 layers.py:1681] output size for __pool_3__ is 2*2 
[INFO 2016-10-05 13:07:55,332 layers.py:1681] output size for __pool_4__ is 1*1 
[INFO 2016-10-05 13:07:55,335 networks.py:1125] The input order is [image, label]
[INFO 2016-10-05 13:07:55,335 networks.py:1132] The output order is [__cost_0__]
I1005 13:07:55.342417   894 Trainer.cpp:170] trainer mode: Normal
F1005 13:07:55.343267   894 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7f1c681cadaa  (unknown)
    @     0x7f1c681cace4  (unknown)
    @     0x7f1c681ca6e6  (unknown)
    @     0x7f1c681cd687  (unknown)
    @           0x78a939  hl_gpu_apply_unary_op<>()
    @           0x7536bf  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x7532a9  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x73d82f  paddle::BaseMatrixT<>::zero()
    @           0x66d2ae  paddle::Parameter::enableType()
    @           0x669acc  paddle::parameterInitNN()
    @           0x66bd13  paddle::NeuralNetwork::init()
    @           0x679ed3  paddle::GradientMachine::create()
    @           0x6a6355  paddle::TrainerInternal::init()
    @           0x6a2697  paddle::Trainer::init()
    @           0x53a1f5  main
    @     0x7f1c673d6f45  (unknown)
    @           0x545ae5  (unknown)
    @              (nil)  (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81:   894 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!

I have also tried recompiling from source and the same error occurs. BTW, the quick_start demo works though.

The text was updated successfully, but these errors were encountered:

gangliao · 2016-10-08T01:17:40Z

gangliao @gangliao 10月 05 15:40
can you give more error info?

alvations @alvations 10月 05 16:05
oh sure. sorry didn't see the conversation.
i have CUDA 8.0 and cudnn 5.1 installed on a machine with 4 GTX 1080
compiling with the .deb and from source works generally
and the quick_start demo works for all 4 architectures
but when i tried to run image_classification demo, it got an error about invalid function
http://stackoverflow.com/questions/39850309/how-to-resolve-cudasuccess-err-0-vs-8-error-on-paddle-v0-8-0b
Using -DCUDA_ARCH=61 didn't help. the same error occurs and also i would need to comment out the elseif on the flags.cmake at line 84.

gangliao @gangliao 10月 05 16:23
To best of my knowledge, whenever the CUDA runtime API returns "Invalid Device Function", it means you are using code which wasn't built for the architecture you are trying to run it on.
I guess the problem is still here
if (CUDA_VERSION VERSION_GREATER "8.0")
list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")
list(APPEND __arch_flags " -gencode arch=compute_61,code=sm_61")
list(APPEND __arch_flags " -gencode arch=compute_62,code=sm_62")
endif()
You can add this code at line 99 in flags.cmake
If it still not work, we will fix it when national holiday over.

alvations @alvations 10月 05 16:28
Thanks! I'll try the list append first =)
sorry to cause inconvenience over the golden week.

gangliao @gangliao 10月 05 16:29
It's fine.

alvations @alvations 10月 05 16:37
Appending a list didn't work too. I'll wait for you guys to fix it next week. Meanwhile, i'll try some other combinations of if-else on the flags.cmake and if any of them work, i'll get back to you =)

alvations @alvations 10月 05 16:56
oh i found the correct combination
for GTX 1080 (Pascal) on my machine it's
sm_60 + sm_52
either won't work, got to be both. I've no idea why though. I had to remove all the other if-else in the flags.cmake and hard-coded the list appending. for 60 and 52

alvations @alvations 10月 05 17:02
Thanks for the pointers and helping out!

* added a train file for ngraph * updte train.py instead

…ePaddle#158)

…/dev/wangyi/impr/test_ipu_place impr(python,test,script) enhance test_ipu_place

* adapt animegan * clean code Co-authored-by: qingqing01 <dangqingqing@baidu.com>

Appflow

alvations changed the title ~~Stack smashing detected on v0.8.0b1~~ "cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 Oct 5, 2016

emailweixu assigned gangliao Oct 7, 2016

gangliao closed this as completed Oct 8, 2016

typhoonzero mentioned this issue Aug 4, 2017

Paddle预测在P4机器上运行出错 #3206

Closed

Meiyim pushed a commit to Meiyim/Paddle that referenced this issue May 21, 2021

support train with ngraph (PaddlePaddle#158)

7383662

* added a train file for ngraph * updte train.py instead

thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021

rename and clean code to make call related APIs more intuitive (Paddl…

c9dd4b7

…ePaddle#158)

gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021

Merge pull request PaddlePaddle#158 from yiakwy-xpu-ml-framework-team…

d3f503c

…/dev/wangyi/impr/test_ipu_place impr(python,test,script) enhance test_ipu_place

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this issue Sep 19, 2022

fix animegan (PaddlePaddle#158)

cf15315

* adapt animegan * clean code Co-authored-by: qingqing01 <dangqingqing@baidu.com>

qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023

update loss_base for Paddle#48563 (PaddlePaddle#158)

6d51894

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024

Merge pull request PaddlePaddle#158 from LokeZhou/appflow

050c2e6

Appflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

alvations commented Oct 4, 2016 •

edited

Loading

gangliao commented Oct 8, 2016

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

Comments

alvations commented Oct 4, 2016 • edited Loading

gangliao commented Oct 8, 2016

alvations commented Oct 4, 2016 •

edited

Loading