Compatiblity issue with CUDA 8.0 #18

stoneyang · 2016-09-01T11:30:41Z

Hi, there,

I can successfully build Paddle on my machine installed Linux 14.04 LTS and CUDA 8.0 as the official guide. And for sure, the CPU version runs well except the speed....

When I ran the image classification demo with the script train.sh in GPU mode (see train.sh for more details), it unfortunately failed and threw out the following info:

I0901 19:18:02.916951 31272 Util.cpp:144] commandline: ../../build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --gpu_id=0 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I0901 19:18:09.213749 31272 Util.cpp:113] Calling runInitFunctions
I0901 19:18:09.428228 31272 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-01 19:18:09,580 layers.py:1430] channels=3 size=3072
[INFO 2016-09-01 19:18:09,580 layers.py:1430] output size for __conv_0__ is 32
[INFO 2016-09-01 19:18:09,583 layers.py:1430] channels=64 size=65536
[INFO 2016-09-01 19:18:09,583 layers.py:1430] output size for __conv_1__ is 32
[INFO 2016-09-01 19:18:09,586 layers.py:1490] output size for __pool_0__ is 16*16
[INFO 2016-09-01 19:18:09,587 layers.py:1430] channels=64 size=16384
[INFO 2016-09-01 19:18:09,587 layers.py:1430] output size for __conv_2__ is 16
[INFO 2016-09-01 19:18:09,590 layers.py:1430] channels=128 size=32768
[INFO 2016-09-01 19:18:09,590 layers.py:1430] output size for __conv_3__ is 16
[INFO 2016-09-01 19:18:09,592 layers.py:1490] output size for __pool_1__ is 8*8
[INFO 2016-09-01 19:18:09,593 layers.py:1430] channels=128 size=8192
[INFO 2016-09-01 19:18:09,594 layers.py:1430] output size for __conv_4__ is 8
[INFO 2016-09-01 19:18:09,596 layers.py:1430] channels=256 size=16384
[INFO 2016-09-01 19:18:09,597 layers.py:1430] output size for __conv_5__ is 8
[INFO 2016-09-01 19:18:09,599 layers.py:1430] channels=256 size=16384
[INFO 2016-09-01 19:18:09,599 layers.py:1430] output size for __conv_6__ is 8
[INFO 2016-09-01 19:18:09,601 layers.py:1490] output size for __pool_2__ is 4*4
[INFO 2016-09-01 19:18:09,602 layers.py:1430] channels=256 size=4096
[INFO 2016-09-01 19:18:09,603 layers.py:1430] output size for __conv_7__ is 4
[INFO 2016-09-01 19:18:09,605 layers.py:1430] channels=512 size=8192
[INFO 2016-09-01 19:18:09,605 layers.py:1430] output size for __conv_8__ is 4
[INFO 2016-09-01 19:18:09,608 layers.py:1430] channels=512 size=8192
[INFO 2016-09-01 19:18:09,608 layers.py:1430] output size for __conv_9__ is 4
[INFO 2016-09-01 19:18:09,610 layers.py:1490] output size for __pool_3__ is 2*2
[INFO 2016-09-01 19:18:09,611 layers.py:1490] output size for __pool_4__ is 1*1
[INFO 2016-09-01 19:18:09,615 networks.py:960] The input order is [image, label]
[INFO 2016-09-01 19:18:09,615 networks.py:963] The output order is [__cost_0__]
I0901 19:18:09.653937 31272 Trainer.cpp:169] trainer mode: Normal
F0901 19:18:09.658243 31272 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7efd1b172daa  (unknown)
    @     0x7efd1b172ce4  (unknown)
    @     0x7efd1b1726e6  (unknown)
    @     0x7efd1b175687  (unknown)
    @           0x78b159  hl_gpu_apply_unary_op<>()
    @           0x753edf  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x753ac9  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x73e04f  paddle::BaseMatrixT<>::zero()
    @           0x62af8e  paddle::Parameter::enableType()
    @           0x6272ec  paddle::parameterInitNN()
    @           0x62975b  paddle::NeuralNetwork::init()
    @           0x62eda3  paddle::GradientMachine::create()
    @           0x6a84e5  paddle::TrainerInternal::init()
    @           0x6a4907  paddle::Trainer::init()
    @           0x543935  main
    @     0x7efd1a37ef45  (unknown)
    @           0x54efd5  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
No data to plot. Exiting!

It seems that Paddle still does not support the latest version of CUDA....

Appended my train.sh as a clue:

(omitted the original copyright info)
#!/bin/bash

set -e
config=vgg_16_cifar.py
output=./cifar_vgg_model
log=train.log

../../build/paddle/trainer/paddle_trainer \
--config=$config \
--dot_period=10 \
--log_period=100 \
--test_all_data_in_one_period=1 \
--use_gpu=1 \
--gpu_id=0 \
--trainer_count=1 \
--num_passes=200 \
--save_dir=$output \
2>&1 | tee $log

python -m paddle.utils.plotcurve -i $log > plot.png

The text was updated successfully, but these errors were encountered:

reyoung · 2016-09-02T02:55:42Z

duplicated with #3 .

CUDA 8.0 support issue is under resolving.

reyoung · 2016-09-02T02:56:09Z

Please track #3 for recent progresses.

stoneyang · 2016-09-02T06:03:08Z

Hi, @reyoung

The problem I posted here is an issue on using PaddlePaddle AFTER SUCCESSFULLY BUILDING THE CODE, which is different from #3 you try to referenced. As far as I know, guy who reported an issue of the building process with CUDA 8.0! That issue can be easily shot down by my merged PR #15 .

So I don't think the two issue can be treated like the same problem.

Appended is my nvcc version for your reference:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

Hope it helps :)

gangliao · 2016-09-03T01:22:32Z

Hi, invalid device function indicates that you have a CUDA / GPU incompatibility.
Since you use GTX 1080, you can modify CMake to fix it.

open cmake/flags.cmake and add following code:
if (CUDA_VERSION VERSION_GREATER "8.0")

list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")

endif()

then, rebuild the project

stoneyang · 2016-09-03T02:49:12Z

Thanks for your reply @gangliao !
I'll try it later.

stoneyang · 2016-09-05T04:07:28Z

@gangliao same errors when running train.sh as my original post ....

I appended the lines at the end of flags.cmake under cmake directory as:

if (CUDA_VERSION VERSION_GREATER "7.0")
    list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
endif()

if(CUDA_VERSION VERSION_GREATER "8.0")
    list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_52")
endif()

set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})

and run the same building steps and it did success.

P.S.: Same error when directly replaced if() ... endif() for 7.0 check and configure with 8.0:

# if (CUDA_VERSION VERSION_GREATER "7.0")
#     list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
# endif()

if(CUDA_VERSION VERSION_GREATER "8.0")
    list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_52")
endif()

set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})

gangliao · 2016-09-05T05:10:37Z

@stoneyang Why do you set sm_52? how about -gencode arch=compute_60,code=sm_60 ??

if (CUDA_VERSION VERSION_GREATER "7.0")
    list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
endif()

if(CUDA_VERSION VERSION_GREATER "8.0")
    list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")
endif()

set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})

stoneyang · 2016-09-05T06:54:15Z

@gangliao sorry for the typo ....

I changed those lines (see PR #40). And the new CUDA code generation configuration works fine, which is different from your suggestion. :)

foreach(capability 30 35 50 52 60)
    list(APPEND __arch_flags " -gencode arch=compute_${capability},code=sm_${capability}")
endforeach()

Please note that the former if() ... endif()-s for CUDA 7.0 or later are commented out since it is invalid (they keep producing errors in my original post). This might be the cmake-thing, which I do not have enough time to verify it for now.

But new error occurs when running:

$ sh train.sh

The log:

I0905 13:50:01.279917 20299 Util.cpp:144] commandline: /path/to/Paddle/build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I0905 13:50:02.728484 20299 Util.cpp:113] Calling runInitFunctions
I0905 13:50:02.728711 20299 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-05 13:50:02,782 layers.py:1438] channels=3 size=3072
[INFO 2016-09-05 13:50:02,782 layers.py:1438] output size for __conv_0__ is 32
[INFO 2016-09-05 13:50:02,784 layers.py:1438] channels=64 size=65536
[INFO 2016-09-05 13:50:02,784 layers.py:1438] output size for __conv_1__ is 32
[INFO 2016-09-05 13:50:02,785 layers.py:1499] output size for __pool_0__ is 16*16
[INFO 2016-09-05 13:50:02,786 layers.py:1438] channels=64 size=16384
[INFO 2016-09-05 13:50:02,786 layers.py:1438] output size for __conv_2__ is 16
[INFO 2016-09-05 13:50:02,787 layers.py:1438] channels=128 size=32768
[INFO 2016-09-05 13:50:02,787 layers.py:1438] output size for __conv_3__ is 16
[INFO 2016-09-05 13:50:02,788 layers.py:1499] output size for __pool_1__ is 8*8
[INFO 2016-09-05 13:50:02,789 layers.py:1438] channels=128 size=8192
[INFO 2016-09-05 13:50:02,789 layers.py:1438] output size for __conv_4__ is 8
[INFO 2016-09-05 13:50:02,790 layers.py:1438] channels=256 size=16384
[INFO 2016-09-05 13:50:02,790 layers.py:1438] output size for __conv_5__ is 8
[INFO 2016-09-05 13:50:02,791 layers.py:1438] channels=256 size=16384
[INFO 2016-09-05 13:50:02,792 layers.py:1438] output size for __conv_6__ is 8
[INFO 2016-09-05 13:50:02,793 layers.py:1499] output size for __pool_2__ is 4*4
[INFO 2016-09-05 13:50:02,793 layers.py:1438] channels=256 size=4096
[INFO 2016-09-05 13:50:02,793 layers.py:1438] output size for __conv_7__ is 4
[INFO 2016-09-05 13:50:02,794 layers.py:1438] channels=512 size=8192
[INFO 2016-09-05 13:50:02,795 layers.py:1438] output size for __conv_8__ is 4
[INFO 2016-09-05 13:50:02,796 layers.py:1438] channels=512 size=8192
[INFO 2016-09-05 13:50:02,796 layers.py:1438] output size for __conv_9__ is 4
[INFO 2016-09-05 13:50:02,797 layers.py:1499] output size for __pool_3__ is 2*2
[INFO 2016-09-05 13:50:02,797 layers.py:1499] output size for __pool_4__ is 1*1
[INFO 2016-09-05 13:50:02,799 networks.py:1122] The input order is [image, label]
[INFO 2016-09-05 13:50:02,799 networks.py:1129] The output order is [__cost_0__]
I0905 13:50:02.806649 20299 Trainer.cpp:169] trainer mode: Normal
I0905 13:50:02.840154 20299 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-05 13:50:03,068 image_provider.py:52] Image size: 32
[INFO 2016-09-05 13:50:03,069 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-05 13:50:03,069 image_provider.py:58] DataProvider Initialization finished
I0905 13:50:03.069367 20299 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-05 13:50:03,069 image_provider.py:52] Image size: 32
[INFO 2016-09-05 13:50:03,069 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-05 13:50:03,070 image_provider.py:58] DataProvider Initialization finished
I0905 13:50:03.070163 20299 GradientMachine.cpp:134] Initing parameters..
I0905 13:50:03.706565 20299 GradientMachine.cpp:141] Init parameters done.
.........
I0905 13:50:44.860791 20299 TrainerInternal.cpp:162]  Batch=100 samples=12800 AvgCost=2.34658 CurrentCost=2.34658 Eval: classification_error_evaluator=0.825234  CurrentEval: classification_error_evaluator=0.825234
.........
I0905 13:50:52.310623 20299 TrainerInternal.cpp:162]  Batch=200 samples=25600 AvgCost=2.16351 CurrentCost=1.98044 Eval: classification_error_evaluator=0.782852  CurrentEval: classification_error_evaluator=0.740469
.........
I0905 13:50:59.720386 20299 TrainerInternal.cpp:162]  Batch=300 samples=38400 AvgCost=2.01217 CurrentCost=1.70949 Eval: classification_error_evaluator=0.743021  CurrentEval: classification_error_evaluator=0.663359
.........I0905 13:51:06.456202 20299 TrainerInternal.cpp:179]  Pass=0 Batch=391 samples=50048 AvgCost=1.89668 Eval: classification_error_evaluator=0.705
F0905 13:51:08.651224 20299 hl_cuda_cudnn.cc:779] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 5) Cudnn Error: CUDNN_STATUS_INVALID_VALUE
*** Check failure stack trace: ***
    @     0x7fc6006fddaa  (unknown)
    @     0x7fc6006fdce4  (unknown)
    @     0x7fc6006fd6e6  (unknown)
    @     0x7fc600700687  (unknown)
    @           0x8b4594  hl_convolution_forward()
    @           0x5b3fac  paddle::CudnnConvLayer::forward()
    @           0x62a01c  paddle::NeuralNetwork::forward()
    @           0x6bdaff  paddle::Tester::testOneBatch()
    @           0x6be412  paddle::Tester::testOnePeriod()
    @           0x6a28d4  paddle::Trainer::trainOnePass()
    @           0x6a5cc7  paddle::Trainer::train()
    @           0x5439f3  main
    @     0x7fc5ff909f45  (unknown)
    @           0x54efd5  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

Looks like an cuDNN issue about the function cudnnConvolutionForward() defined in Nvidia's cuDNN library. Might it is a wrapper issue?

gangliao · 2016-09-05T07:17:52Z

GPU K20/40 works fine on this demo. Currently, this bug only appeared on GPU GTX, one of my colleagues already reproduced it and will solve it soon. Thanks.

stoneyang · 2016-09-05T07:43:34Z

Thanks for reopening this issue! @gangliao

Appended is the version of cuDNN: 5.0.5.

Hope it is helpful.

hedaoyuan · 2016-09-29T14:01:21Z

Fixed #107, and closed.

stoneyang · 2016-09-30T10:27:19Z

Same issue as the original report still persists after new commits were fetched and re-make-ed. Any further suggestions? @gangliao @hedaoyuan

hedaoyuan · 2016-09-30T17:15:54Z

@stoneyang Can you try adding GLOG_v=3 option on the command line(like this GLOG_v=3 paddle ...). This will print out more debug information.

stoneyang · 2016-10-04T02:39:24Z

@hedaoyuan , Hi, thanks for your reply! Do you mean add GLOG_v=3 in train.sh under demo\image_classification? I tried this and Paddle gives me the following error log (seems a little different from the one in the original post):

I1004 10:36:08.486361  8402 Util.cpp:151] commandline: ../../build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I1004 10:36:14.866709  8402 Util.cpp:126] Calling runInitFunctions
I1004 10:36:14.866930  8402 Util.cpp:139] Call runInitFunctions done.
I1004 10:36:14.875442  8402 TrainerConfigHelper.cpp:55] Parsing trainer config vgg_16_cifar.py
[INFO 2016-10-04 10:36:15,093 layers.py:1620] channels=3 size=3072
[INFO 2016-10-04 10:36:15,093 layers.py:1620] output size for __conv_0__ is 32
[INFO 2016-10-04 10:36:15,096 layers.py:1620] channels=64 size=65536
[INFO 2016-10-04 10:36:15,097 layers.py:1620] output size for __conv_1__ is 32
[INFO 2016-10-04 10:36:15,099 layers.py:1681] output size for __pool_0__ is 16*16
[INFO 2016-10-04 10:36:15,100 layers.py:1620] channels=64 size=16384
[INFO 2016-10-04 10:36:15,100 layers.py:1620] output size for __conv_2__ is 16
[INFO 2016-10-04 10:36:15,101 layers.py:1620] channels=128 size=32768
[INFO 2016-10-04 10:36:15,102 layers.py:1620] output size for __conv_3__ is 16
[INFO 2016-10-04 10:36:15,103 layers.py:1681] output size for __pool_1__ is 8*8
[INFO 2016-10-04 10:36:15,103 layers.py:1620] channels=128 size=8192
[INFO 2016-10-04 10:36:15,104 layers.py:1620] output size for __conv_4__ is 8
[INFO 2016-10-04 10:36:15,105 layers.py:1620] channels=256 size=16384
[INFO 2016-10-04 10:36:15,105 layers.py:1620] output size for __conv_5__ is 8
[INFO 2016-10-04 10:36:15,106 layers.py:1620] channels=256 size=16384
[INFO 2016-10-04 10:36:15,106 layers.py:1620] output size for __conv_6__ is 8
[INFO 2016-10-04 10:36:15,108 layers.py:1681] output size for __pool_2__ is 4*4
[INFO 2016-10-04 10:36:15,108 layers.py:1620] channels=256 size=4096
[INFO 2016-10-04 10:36:15,108 layers.py:1620] output size for __conv_7__ is 4
[INFO 2016-10-04 10:36:15,110 layers.py:1620] channels=512 size=8192
[INFO 2016-10-04 10:36:15,110 layers.py:1620] output size for __conv_8__ is 4
[INFO 2016-10-04 10:36:15,111 layers.py:1620] channels=512 size=8192
[INFO 2016-10-04 10:36:15,111 layers.py:1620] output size for __conv_9__ is 4
[INFO 2016-10-04 10:36:15,112 layers.py:1681] output size for __pool_3__ is 2*2
[INFO 2016-10-04 10:36:15,113 layers.py:1681] output size for __pool_4__ is 1*1
[INFO 2016-10-04 10:36:15,115 networks.py:1125] The input order is [image, label]
[INFO 2016-10-04 10:36:15,115 networks.py:1132] The output order is [__cost_0__]
I1004 10:36:15.138290  8402 Trainer.cpp:170] trainer mode: Normal
F1004 10:36:15.139696  8402 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7fdd6ccaedaa  (unknown)
    @     0x7fdd6ccaece4  (unknown)
    @     0x7fdd6ccae6e6  (unknown)
    @     0x7fdd6ccb1687  (unknown)
    @           0x78ffa9  hl_gpu_apply_unary_op<>()
    @           0x758d2f  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x758919  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x742e9f  paddle::BaseMatrixT<>::zero()
    @           0x62c94e  paddle::Parameter::enableType()
    @           0x628e0c  paddle::parameterInitNN()
    @           0x62b27b  paddle::NeuralNetwork::init()
    @           0x630e93  paddle::GradientMachine::create()
    @           0x6ad345  paddle::TrainerInternal::init()
    @           0x6a9727  paddle::Trainer::init()
    @           0x542a65  main
    @     0x7fdd6bebaf45  (unknown)
    @           0x54e355  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
No data to plot. Exiting!

gangliao · 2016-10-08T02:03:13Z

@stoneyang fixed Fix CUDA_VERSION Comparsion #165

stoneyang · 2016-10-08T03:16:30Z

@gangliao , seems perfect now!

merge to local

add cluster quickstart

Add models api

…backward Merge pull request PaddlePaddle#18 from joey12300/add-lstm-cudnn_cpu_backward

speed up graph neighbors sampling

* compiler test pr * Init compiler dir. * init version for piano_op_search_and_replace_pass * save status, compile ok * add some test for piano_search_and_replace_pass * basic test case is ok * pass unittest * piano_search_and_replace_pass to piano_op_search_and_replace_pass * add some comments for piano_op_search_and_replace_pass * add links between cluster_inputs/cluster_outputs and compiled_cluster * use const& cluster_inputs instead of * p_cluster_inputs * add some comments and encapsulate some sub functions * rename piano_op_search_and_replace_pass to piano_compile_pass * a small modify in comments Co-authored-by: Zhen Wang <wangzhen31@baidu.com>

add tile transform

add google news word embedding

add trt int8 tutorial and demos

* Optimize memory overhead for gpugraph. * Optimize memory overhead for gpugraph.

fix dataset read ins pipe

delete update notes from README.md

[DRR] Refactor ther drr pattern class

update xdnn version

[MTAI-484] fix(build): change commit id in eigen.cmake

Chinese tutorial

Slightly improve installation process

Update to v2.0.8

add gatherwithgrad && delete extra comments

emailweixu assigned hedaoyuan and unassigned hedaoyuan Sep 1, 2016

reyoung added the duplicate label Sep 2, 2016

reyoung closed this as completed Sep 2, 2016

stoneyang mentioned this issue Sep 2, 2016

安装失败 #12

Closed

gangliao mentioned this issue Sep 3, 2016

Sentiment analysis failures: invalid device function #34

Closed

stoneyang mentioned this issue Sep 5, 2016

added configurations of nvcc code generation for modern GPUs #40

Closed

gangliao added the Bug label Sep 5, 2016

gangliao reopened this Sep 5, 2016

hedaoyuan assigned gangliao Sep 12, 2016

hedaoyuan closed this as completed Sep 29, 2016

hedaoyuan assigned qingqing01 Sep 30, 2016

alvations mentioned this issue Oct 5, 2016

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

Closed

Haichao-Zhang mentioned this issue Nov 18, 2016

LstmLayer unit test in test_LayerGrad failed #534

Closed

junjun315 added a commit that referenced this issue Apr 3, 2019

Merge pull request #18 from PaddlePaddle/develop

67cddc2

merge to local

kangzhangqi mentioned this issue Jun 26, 2019

c++多线程预测给压力之后，输入越界，出core，报warning。 #18337

Closed

songyiting mentioned this issue Aug 22, 2019

多线程环境下使用fluid在线预估库，释放clone的predictor出core #19361

Closed

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019

Merge pull request PaddlePaddle#18 from Yancey1989/cluster_quick_start

6a4c9b9

add cluster quickstart

xiuechen mentioned this issue Oct 28, 2019

预测出core，能帮忙看下啥原因不？paddle训练和预测的版本都是v1.3.0 #20859

Closed

lost-little-lamb mentioned this issue Dec 2, 2019

训练好的模型，c++预测接口出core #21472

Closed

Angus07 mentioned this issue Mar 11, 2020

预测出core #22948

Closed

qingqing01 pushed a commit to qingqing01/Paddle that referenced this issue Apr 30, 2020

Merge pull request PaddlePaddle#18 from LielinJiang/cv-models

e252e28

Add models api

PengheLiu mentioned this issue Oct 27, 2020

C++预测时出 core，不知道原因在哪，同样的模型使用 pathon 预测 api 无问题 #28268

Closed

joey12300 pushed a commit to joey12300/Paddle that referenced this issue Nov 2, 2020

Merge pull request PaddlePaddle#18 from joey12300/add-lstm-cudnn_cpu_…

c77c610

…backward Merge pull request PaddlePaddle#18 from joey12300/add-lstm-cudnn_cpu_backward

DemoMoon mentioned this issue Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

paddle-bot-old bot pushed a commit that referenced this issue Jul 11, 2021

Merge pull request #18 from Liwb5/engine2.0

fc74f83

speed up graph neighbors sampling

thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021

Merge pull request PaddlePaddle#18 from Superjomn/refine-tile

bedf2e4

add tile transform

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this issue Dec 9, 2021

Merge pull request PaddlePaddle#18 from joey12300/add_google_news

5f0821c

add google news word embedding

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this issue May 23, 2022

Merge pull request PaddlePaddle#18 from cryoco/add-trt-int8-doc

22001a1

add trt int8 tutorial and demos

Thunderbrook pushed a commit to Thunderbrook/Paddle that referenced this issue Jun 7, 2022

Optimize memory overhead for gpugraph. (PaddlePaddle#18)

d7eae03

* Optimize memory overhead for gpugraph. * Optimize memory overhead for gpugraph.

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Opt for datafeed (Mingqing) (#18)

b3ac26c

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #18 from qingshui/paddlebox

a8439c3

fix dataset read ins pipe

marsbzp mentioned this issue Jan 11, 2023

多线程调用C++推理库进行RNN算子崩溃问题！！！！ #49737

Open

qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023

delete update notes (PaddlePaddle#18)

533c239

delete update notes from README.md

chlyzzo mentioned this issue Mar 29, 2023

paddle/fluid/core_avx.so paddle::memory::allocation::MemoryMapFdSet::Clear() #52269

Closed

zyfncg pushed a commit to zyfncg/Paddle that referenced this issue Aug 22, 2023

Merge pull request PaddlePaddle#18 from zyfncg/drr_pass

f0751d6

[DRR] Refactor ther drr pattern class

zmxdream pushed a commit to zmxdream/Paddle that referenced this issue Nov 4, 2023

Merge pull request PaddlePaddle#18 from xymyeah/update_xdnn_version

6479cdd

update xdnn version

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024

update nas (PaddlePaddle#18)

fdb09f0

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024

Merge pull request PaddlePaddle#18 from mthreads/change_eigen3_id

e21c198

[MTAI-484] fix(build): change commit id in eigen.cmake

NKNaN pushed a commit to NKNaN/Paddle that referenced this issue Mar 3, 2024

Merge pull request PaddlePaddle#18 from ekelsen/master

6041ac1

Chinese tutorial

wwbitejotunn pushed a commit to wwbitejotunn/Paddle that referenced this issue Jun 21, 2024

Merge pull request PaddlePaddle#18 from gahdritz/main

01947bc

Slightly improve installation process

wwbitejotunn pushed a commit to wwbitejotunn/Paddle that referenced this issue Jun 21, 2024

Merge pull request PaddlePaddle#18 from Xreki/update_to_v2.0.8

2e133ca

Update to v2.0.8

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024

Merge pull request PaddlePaddle#18 from zhiboniu/develop

e7f446b

add gatherwithgrad && delete extra comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatiblity issue with CUDA 8.0 #18

Compatiblity issue with CUDA 8.0 #18

stoneyang commented Sep 1, 2016

reyoung commented Sep 2, 2016

reyoung commented Sep 2, 2016

stoneyang commented Sep 2, 2016 •

edited

Loading

gangliao commented Sep 3, 2016

stoneyang commented Sep 3, 2016

stoneyang commented Sep 5, 2016 •

edited

Loading

gangliao commented Sep 5, 2016 •

edited

Loading

stoneyang commented Sep 5, 2016 •

edited

Loading

gangliao commented Sep 5, 2016

stoneyang commented Sep 5, 2016 •

edited

Loading

hedaoyuan commented Sep 29, 2016

stoneyang commented Sep 30, 2016 •

edited

Loading

hedaoyuan commented Sep 30, 2016

stoneyang commented Oct 4, 2016

gangliao commented Oct 8, 2016

stoneyang commented Oct 8, 2016

Compatiblity issue with CUDA 8.0 #18

Compatiblity issue with CUDA 8.0 #18

Comments

stoneyang commented Sep 1, 2016

reyoung commented Sep 2, 2016

reyoung commented Sep 2, 2016

stoneyang commented Sep 2, 2016 • edited Loading

gangliao commented Sep 3, 2016

stoneyang commented Sep 3, 2016

stoneyang commented Sep 5, 2016 • edited Loading

gangliao commented Sep 5, 2016 • edited Loading

stoneyang commented Sep 5, 2016 • edited Loading

gangliao commented Sep 5, 2016

stoneyang commented Sep 5, 2016 • edited Loading

hedaoyuan commented Sep 29, 2016

stoneyang commented Sep 30, 2016 • edited Loading

hedaoyuan commented Sep 30, 2016

stoneyang commented Oct 4, 2016

gangliao commented Oct 8, 2016

stoneyang commented Oct 8, 2016

stoneyang commented Sep 2, 2016 •

edited

Loading

stoneyang commented Sep 5, 2016 •

edited

Loading

gangliao commented Sep 5, 2016 •

edited

Loading

stoneyang commented Sep 5, 2016 •

edited

Loading

stoneyang commented Sep 5, 2016 •

edited

Loading

stoneyang commented Sep 30, 2016 •

edited

Loading