在云上的机器跑gpu的版本的报错 #2931

WoNiuHu · 2017-07-18T02:57:34Z

hi，在云上的机器跑的gpu的版本的报错如下：
`root@311257b81a76:/work# bash run.sh lstm_train

I0718 02:20:33.320343 36 Util.cpp:155] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=true --trainer_count=1 --num_passes=2 --log_period=10 --dot_period=20 --show_parameter_stats_period=100 --test_all_data_in_one_period=1

I0718 02:22:57.979029 36 Util.cpp:130] Calling runInitFunctions

I0718 02:22:57.980270 36 Util.cpp:143] Call runInitFunctions done.

[INFO 2017-07-18 02:22:59,474 networks.py:1466] The input order is [word, label]
[INFO 2017-07-18 02:22:59,474 networks.py:1472] The output order is [cost_0]
I0718 02:22:59.617970 36 Trainer.cpp:170] trainer mode: Normal

F0718 02:22:59.624567 36 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***`

已经安装nvidia-docker 和 paddledev/paddle:gpu-release-v0.9.0 的镜像，

hedaoyuan · 2017-07-18T03:05:47Z

@WoNiuHu 相似的问题参考 #34
paddledev/paddle:gpu-release-v0.9.0的镜像可能不是用sm_60编译的，不支持你的环境。
换成0.10.0-dev试一下，0.10的是cuda-8.0编译的，包含sm_60相关代码的。

另外，你能回复一下你的环境的显卡型号和驱动版本吗？

WoNiuHu · 2017-07-18T03:26:03Z

@hedaoyuan

Cirrus Logic GD 5446
NVIDIA Corporation Device 1b00 (rev a1)

helinwang · 2017-07-18T21:14:57Z

@hedaoyuan 我理解v0.10.0不支持v1 API（具体请看#2946 (comment) ），我来帮@WoNiuHu编译一个v0.9.0的CUDA 8 docker镜像吧。

typhoonzero · 2017-07-19T01:51:50Z

@helinwang v0.10.0 应该是支持v1 API的。v2 API仅是在v1基础之上增加了新的python binding入口，v1的使用方式在develop分支是一直可以使用的。

WoNiuHu · 2017-07-19T01:56:41Z

@helinwang 辛苦辛苦

helinwang · 2017-07-19T06:05:34Z

@typhoonzero 请看#2946 (comment) ，我测试了一下paddlepaddle/paddle:0.10.0rc2是无法跑https://github.com/PaddlePaddle/Paddle/tree/develop/v1_api_demo/mnist 这个demo的。

typhoonzero · 2017-07-19T06:14:11Z

回复了： #2946 (comment)

是rc2这个镜像的一个bug，rc3，0.10.0这些tag都是支持的。

另外，看到这个issue了：NVIDIA/nvidia-docker#346 invalid device function一般是cuda和GPU不适配的问题吧。

WoNiuHu · 2017-07-19T09:22:42Z

@helinwang @typhoonzero 所以这个问题的解决方案是？

helinwang · 2017-07-19T19:10:04Z

@WoNiuHu 请用0.10.0的docker image运行，操作方法是和0.9.0的docker image一致的。

@typhoonzero

明白了，经测试0.10.0确实可以！谢谢！

另外，看到这个issue了：NVIDIA/nvidia-docker#346 invalid device function一般是cuda和GPU不适配的问题吧。

对的，请看#2931 (comment)

JiayiFeng · 2017-08-30T00:06:47Z

Close this issue due to inactivity. please feel free to reopen it if more information is available.

hedaoyuan self-assigned this Jul 18, 2017

helinwang mentioned this issue Jul 18, 2017

在云上的GPU机器跑paddle0.10.0的V1的api #2946

Closed

JiayiFeng closed this as completed Aug 30, 2017

guoshengCS mentioned this issue Nov 15, 2017

安装GPU版本后运行异常 #5674

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在云上的机器跑gpu的版本的报错 #2931

在云上的机器跑gpu的版本的报错 #2931

WoNiuHu commented Jul 18, 2017 •

edited

Loading

hedaoyuan commented Jul 18, 2017 •

edited

Loading

WoNiuHu commented Jul 18, 2017

helinwang commented Jul 18, 2017 •

edited

Loading

typhoonzero commented Jul 19, 2017

WoNiuHu commented Jul 19, 2017

helinwang commented Jul 19, 2017 •

edited

Loading

typhoonzero commented Jul 19, 2017

WoNiuHu commented Jul 19, 2017

helinwang commented Jul 19, 2017 •

edited

Loading

JiayiFeng commented Aug 30, 2017

在云上的机器跑gpu的版本的报错 #2931

在云上的机器跑gpu的版本的报错 #2931

Comments

WoNiuHu commented Jul 18, 2017 • edited Loading

hedaoyuan commented Jul 18, 2017 • edited Loading

WoNiuHu commented Jul 18, 2017

helinwang commented Jul 18, 2017 • edited Loading

typhoonzero commented Jul 19, 2017

WoNiuHu commented Jul 19, 2017

helinwang commented Jul 19, 2017 • edited Loading

typhoonzero commented Jul 19, 2017

WoNiuHu commented Jul 19, 2017

helinwang commented Jul 19, 2017 • edited Loading

JiayiFeng commented Aug 30, 2017

WoNiuHu commented Jul 18, 2017 •

edited

Loading

hedaoyuan commented Jul 18, 2017 •

edited

Loading

helinwang commented Jul 18, 2017 •

edited

Loading

helinwang commented Jul 19, 2017 •

edited

Loading

helinwang commented Jul 19, 2017 •

edited

Loading