Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

Closed
alvations opened this issue Oct 4, 2016 · 1 comment
Closed

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

alvations opened this issue Oct 4, 2016 · 1 comment
Assignees

Comments

@alvations
Copy link
Contributor

alvations commented Oct 4, 2016

I have installed paddlepaddle using the .deb file from https://github.com/baidu/Paddle/releases/download/V0.8.0b1/paddle-gpu-0.8.0b1-Linux.deb

I have a GTX 1080 with CUDA 8.0 installed with cudnn v5.1 without the NVIDIA Accelerated Graphics Driver

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

I've set the shell variables:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

And when I tried to run the demo from the Paddle github repo, I am getting a [hl_gpu_apply_unary_op failed] CUDA error: invalid device function error. Is there some way to resolve this?
#3, #18, #95 seems to occur even though the cmake file on the latest release should have been fixed #107 , i'm getting this error:

~/Paddle/demo/image_classification$ bash train.sh 
I1005 13:07:51.456790   894 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model 
I1005 13:07:55.145606   894 Util.cpp:126] Calling runInitFunctions
I1005 13:07:55.145925   894 Util.cpp:139] Call runInitFunctions done.
[INFO 2016-10-05 13:07:55,313 layers.py:1620] channels=3 size=3072
[INFO 2016-10-05 13:07:55,313 layers.py:1620] output size for __conv_0__ is 32 
[INFO 2016-10-05 13:07:55,315 layers.py:1620] channels=64 size=65536
[INFO 2016-10-05 13:07:55,315 layers.py:1620] output size for __conv_1__ is 32 
[INFO 2016-10-05 13:07:55,316 layers.py:1681] output size for __pool_0__ is 16*16 
[INFO 2016-10-05 13:07:55,317 layers.py:1620] channels=64 size=16384
[INFO 2016-10-05 13:07:55,317 layers.py:1620] output size for __conv_2__ is 16 
[INFO 2016-10-05 13:07:55,319 layers.py:1620] channels=128 size=32768
[INFO 2016-10-05 13:07:55,319 layers.py:1620] output size for __conv_3__ is 16 
[INFO 2016-10-05 13:07:55,320 layers.py:1681] output size for __pool_1__ is 8*8 
[INFO 2016-10-05 13:07:55,321 layers.py:1620] channels=128 size=8192
[INFO 2016-10-05 13:07:55,321 layers.py:1620] output size for __conv_4__ is 8 
[INFO 2016-10-05 13:07:55,323 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 13:07:55,323 layers.py:1620] output size for __conv_5__ is 8 
[INFO 2016-10-05 13:07:55,324 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 13:07:55,325 layers.py:1620] output size for __conv_6__ is 8 
[INFO 2016-10-05 13:07:55,326 layers.py:1681] output size for __pool_2__ is 4*4 
[INFO 2016-10-05 13:07:55,327 layers.py:1620] channels=256 size=4096
[INFO 2016-10-05 13:07:55,327 layers.py:1620] output size for __conv_7__ is 4 
[INFO 2016-10-05 13:07:55,328 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 13:07:55,329 layers.py:1620] output size for __conv_8__ is 4 
[INFO 2016-10-05 13:07:55,330 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 13:07:55,330 layers.py:1620] output size for __conv_9__ is 4 
[INFO 2016-10-05 13:07:55,332 layers.py:1681] output size for __pool_3__ is 2*2 
[INFO 2016-10-05 13:07:55,332 layers.py:1681] output size for __pool_4__ is 1*1 
[INFO 2016-10-05 13:07:55,335 networks.py:1125] The input order is [image, label]
[INFO 2016-10-05 13:07:55,335 networks.py:1132] The output order is [__cost_0__]
I1005 13:07:55.342417   894 Trainer.cpp:170] trainer mode: Normal
F1005 13:07:55.343267   894 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7f1c681cadaa  (unknown)
    @     0x7f1c681cace4  (unknown)
    @     0x7f1c681ca6e6  (unknown)
    @     0x7f1c681cd687  (unknown)
    @           0x78a939  hl_gpu_apply_unary_op<>()
    @           0x7536bf  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x7532a9  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x73d82f  paddle::BaseMatrixT<>::zero()
    @           0x66d2ae  paddle::Parameter::enableType()
    @           0x669acc  paddle::parameterInitNN()
    @           0x66bd13  paddle::NeuralNetwork::init()
    @           0x679ed3  paddle::GradientMachine::create()
    @           0x6a6355  paddle::TrainerInternal::init()
    @           0x6a2697  paddle::Trainer::init()
    @           0x53a1f5  main
    @     0x7f1c673d6f45  (unknown)
    @           0x545ae5  (unknown)
    @              (nil)  (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81:   894 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!

I have also tried recompiling from source and the same error occurs. BTW, the quick_start demo works though.

@alvations alvations changed the title Stack smashing detected on v0.8.0b1 "cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 Oct 5, 2016
@gangliao
Copy link
Contributor

gangliao commented Oct 8, 2016

gangliao @gangliao 10月 05 15:40
can you give more error info?

alvations @alvations 10月 05 16:05
oh sure. sorry didn't see the conversation.
i have CUDA 8.0 and cudnn 5.1 installed on a machine with 4 GTX 1080
compiling with the .deb and from source works generally
and the quick_start demo works for all 4 architectures
but when i tried to run image_classification demo, it got an error about invalid function
http://stackoverflow.com/questions/39850309/how-to-resolve-cudasuccess-err-0-vs-8-error-on-paddle-v0-8-0b
Using -DCUDA_ARCH=61 didn't help. the same error occurs and also i would need to comment out the elseif on the flags.cmake at line 84.

gangliao @gangliao 10月 05 16:23
To best of my knowledge, whenever the CUDA runtime API returns "Invalid Device Function", it means you are using code which wasn't built for the architecture you are trying to run it on.
I guess the problem is still here
if (CUDA_VERSION VERSION_GREATER "8.0")
list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")
list(APPEND __arch_flags " -gencode arch=compute_61,code=sm_61")
list(APPEND __arch_flags " -gencode arch=compute_62,code=sm_62")
endif()
You can add this code at line 99 in flags.cmake
If it still not work, we will fix it when national holiday over.

alvations @alvations 10月 05 16:28
Thanks! I'll try the list append first =)
sorry to cause inconvenience over the golden week.

gangliao @gangliao 10月 05 16:29
It's fine.

alvations @alvations 10月 05 16:37
Appending a list didn't work too. I'll wait for you guys to fix it next week. Meanwhile, i'll try some other combinations of if-else on the flags.cmake and if any of them work, i'll get back to you =)

alvations @alvations 10月 05 16:56
oh i found the correct combination
for GTX 1080 (Pascal) on my machine it's
sm_60 + sm_52
either won't work, got to be both. I've no idea why though. I had to remove all the other if-else in the flags.cmake and hard-coded the list appending. for 60 and 52

alvations @alvations 10月 05 17:02
Thanks for the pointers and helping out!

@gangliao gangliao closed this as completed Oct 8, 2016
Meiyim pushed a commit to Meiyim/Paddle that referenced this issue May 21, 2021
* added a train file for ngraph

* updte train.py instead
gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021
…/dev/wangyi/impr/test_ipu_place

impr(python,test,script) enhance test_ipu_place
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this issue Sep 19, 2022
* adapt animegan

* clean code

Co-authored-by: qingqing01 <dangqingqing@baidu.com>
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023
WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants