Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在云上的机器跑gpu的版本的报错 #2931

Closed
WoNiuHu opened this issue Jul 18, 2017 · 10 comments
Closed

在云上的机器跑gpu的版本的报错 #2931

WoNiuHu opened this issue Jul 18, 2017 · 10 comments
Assignees

Comments

@WoNiuHu
Copy link

WoNiuHu commented Jul 18, 2017

hi,在云上的机器跑的gpu的版本的报错如下:
`root@311257b81a76:/work# bash run.sh lstm_train

I0718 02:20:33.320343 36 Util.cpp:155] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=true --trainer_count=1 --num_passes=2 --log_period=10 --dot_period=20 --show_parameter_stats_period=100 --test_all_data_in_one_period=1

I0718 02:22:57.979029 36 Util.cpp:130] Calling runInitFunctions

I0718 02:22:57.980270 36 Util.cpp:143] Call runInitFunctions done.

[INFO 2017-07-18 02:22:59,474 networks.py:1466] The input order is [word, label]
[INFO 2017-07-18 02:22:59,474 networks.py:1472] The output order is [cost_0]
I0718 02:22:59.617970 36 Trainer.cpp:170] trainer mode: Normal

F0718 02:22:59.624567 36 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***`

已经安装nvidia-docker 和 paddledev/paddle:gpu-release-v0.9.0 的镜像,

@hedaoyuan
Copy link
Contributor

hedaoyuan commented Jul 18, 2017

@WoNiuHu 相似的问题参考 #34
paddledev/paddle:gpu-release-v0.9.0的镜像可能不是用sm_60编译的,不支持你的环境。
换成0.10.0-dev试一下,0.10的是cuda-8.0编译的,包含sm_60相关代码的。

另外,你能回复一下你的环境的显卡型号和驱动版本吗?

@hedaoyuan hedaoyuan self-assigned this Jul 18, 2017
@WoNiuHu
Copy link
Author

WoNiuHu commented Jul 18, 2017

@hedaoyuan

Cirrus Logic GD 5446
NVIDIA Corporation Device 1b00 (rev a1)

@helinwang
Copy link
Contributor

helinwang commented Jul 18, 2017

@hedaoyuan 我理解v0.10.0不支持v1 API(具体请看#2946 (comment) ),我来帮@WoNiuHu编译一个v0.9.0的CUDA 8 docker镜像吧。

@typhoonzero
Copy link
Contributor

@helinwang v0.10.0 应该是支持v1 API的。v2 API仅是在v1基础之上增加了新的python binding入口,v1的使用方式在develop分支是一直可以使用的。

@WoNiuHu
Copy link
Author

WoNiuHu commented Jul 19, 2017

@helinwang 辛苦辛苦

@helinwang
Copy link
Contributor

helinwang commented Jul 19, 2017

@typhoonzero 请看#2946 (comment) ,我测试了一下paddlepaddle/paddle:0.10.0rc2是无法跑https://github.com/PaddlePaddle/Paddle/tree/develop/v1_api_demo/mnist 这个demo的。

@typhoonzero
Copy link
Contributor

回复了: #2946 (comment)

是rc2这个镜像的一个bug,rc3,0.10.0这些tag都是支持的。

另外,看到这个issue了:NVIDIA/nvidia-docker#346 invalid device function一般是cuda和GPU不适配的问题吧。

@WoNiuHu
Copy link
Author

WoNiuHu commented Jul 19, 2017

@helinwang @typhoonzero 所以这个问题的解决方案是?

@helinwang
Copy link
Contributor

helinwang commented Jul 19, 2017

@WoNiuHu 请用0.10.0的docker image运行,操作方法是和0.9.0的docker image一致的。

@typhoonzero

明白了,经测试0.10.0确实可以!谢谢!

另外,看到这个issue了:NVIDIA/nvidia-docker#346 invalid device function一般是cuda和GPU不适配的问题吧。

对的,请看#2931 (comment)

@JiayiFeng
Copy link
Collaborator

Close this issue due to inactivity. please feel free to reopen it if more information is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants