Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compile Bug] pr60551 引入 Linux 环境下源码编译预测库报错:18 errors detected in the compilation of "/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu". #60673

Closed
EmmonsCurse opened this issue Jan 9, 2024 · 11 comments
Assignees
Labels

Comments

@EmmonsCurse
Copy link

bug描述 Describe the Bug

1、报错信息

#60551 改动引入 Linux 环境下裸机源码编译预测库报错(GPU&XPU 联编),部分错误日志如下:

[00:55:13]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(385): error: identifier "cudaMallocAsync" is undefined
[00:55:13]	          detected during instantiation of "void phi::funcs::gpu_gather_kernel<tensor_t,index_t>(phi::DenseTensor, int, const phi::DenseTensor &, phi::DenseTensor, __nv_bool, const phi::DeviceContext &) [with tensor_t=int, index_t=int]" 
[00:55:13]	(1317): here
[00:55:13]	
[00:55:13]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(402): error: identifier "cudaFreeAsync" is undefined
[00:55:13]	          detected during instantiation of "void phi::funcs::gpu_gather_kernel<tensor_t,index_t>(phi::DenseTensor, int, const phi::DenseTensor &, phi::DenseTensor, __nv_bool, const phi::DeviceContext &) [with tensor_t=int, index_t=int]" 
[00:55:13]	(1317): here
[00:55:13]	
[00:55:13]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(406): error: identifier "cudaMallocAsync" is undefined
[00:55:13]	          detected during instantiation of "void phi::funcs::gpu_gather_kernel<tensor_t,index_t>(phi::DenseTensor, int, const phi::DenseTensor &, phi::DenseTensor, __nv_bool, const phi::DeviceContext &) [with tensor_t=int, index_t=int]" 
[00:55:13]	(1317): here
[00:55:13]	
[00:55:13]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(424): error: identifier "cudaFreeAsync" is undefined
[00:55:13]	          detected during instantiation of "void phi::funcs::gpu_gather_kernel<tensor_t,index_t>(phi::DenseTensor, int, const phi::DenseTensor &, phi::DenseTensor, __nv_bool, const phi::DeviceContext &) [with tensor_t=int, index_t=int]" 
[00:55:13]	(1317): here
[00:55:13]	
[00:55:13]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(430): error: identifier "cudaMallocAsync" is undefined
[00:55:13]	          detected during instantiation of "void phi::funcs::gpu_gather_kernel<tensor_t,index_t>(phi::DenseTensor, int, const phi::DenseTensor &, phi::DenseTensor, __nv_bool, const phi::DeviceContext &) [with tensor_t=int, index_t=int]" 
[00:55:13]	(1317): here
[00:55:13]	
[00:55:13]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(450): error: identifier "cudaFreeAsync" is undefined
[00:55:13]	          detected during instantiation of "void phi::funcs::gpu_gather_kernel<tensor_t,index_t>(phi::DenseTensor, int, const phi::DenseTensor &, phi::DenseTensor, __nv_bool, const phi::DeviceContext &) [with tensor_t=int, index_t=int]" 
[00:55:13]	(1317): here
[00:55:13]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(976): error: identifier "cudaMallocAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(992): error: identifier "cudaFreeAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(763): error: identifier "cudaMallocAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(778): error: identifier "cudaFreeAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(782): error: identifier "cudaMallocAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(802): error: identifier "cudaFreeAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(880): error: identifier "cudaMallocAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(893): error: identifier "cudaFreeAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(1115): error: identifier "cudaMallocAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(1132): error: identifier "cudaFreeAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(1293): error: identifier "cudaMallocAsync" is undefined
[00:55:14]	
[00:55:14]	/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu(1313): error: identifier "cudaFreeAsync" is undefined
[00:55:14]	
[00:55:14]	18 errors detected in the compilation of "/home/disk1/teamcity/work/35d35f7d19925f81/Paddle/paddle/phi/kernels/funcs/gather_scatter_functor.cu".
[00:55:14]	make64[2]: *** [paddle/phi/CMakeFiles/phi.dir/kernels/funcs/gather_scatter_functor.cu.o] Error 1
[00:55:14]	make64[2]: *** Waiting for unfinished jobs....

2、复现命令

cmake 命令:

cmake .. '-DCMAKE_CXX_FLAGS=-Wl,--rpath=/opt/compiler/gcc-8.2/lib,--dynamic-linker=/opt/compiler/gcc-8.2/lib/ld-linux-x86-64.so.2 -Wno-error -w' -DON_INFER=ON -DWITH_NCCL=OFF -DWITH_PYTHON=OFF -DPY_VERSION=3.8 -DWITH_RCCL=OFF -DWITH_MKL=ON -DWITH_AVX=OFF -DWITH_MKLDNN=ON -DWITH_GPU=ON -DCMAKE_CUDA_COMPILER=/home/paddle/wangye19/nvidia/cuda-11.1/bin/nvcc -DCUDA_TOOLKIT_ROOT_DIR=/home/paddle/wangye19/nvidia/cuda-11.1 -DCUDNN_ROOT=/home/paddle/wangye19/nvidia/cudnn_v8.2.1/cuda -DWITH_TENSORRT=ON -DTENSORRT_ROOT=/home/paddle/wangye19/nvidia/TensorRT-8.4.0.6 -DWITH_TESTING=OFF -DWITH_INFERENCE_API_TEST=OFF -DWITH_DISTRIBUTE=OFF -DWITH_STRIP=ON -DWITH_CINN=OFF -DWITH_ONNXRUNTIME=OFF -DCUDA_ARCH_NAME=Manual '-DCUDA_ARCH_BIN=75 80 86' -DWITH_XPU=OFF -DWITH_LITE=ON -DLITE_WITH_XPU=ON -DLITE_GIT_TAG=d06a1f36ec564fb618d555b342ca1076623d8b94 -DWITH_BDCENTOS=ON -DWITH_SHARED_IR=OFF -DPYTHON_EXECUTABLE=/home/disk1/python_env/bin/python

@YibinLiu666 @luotao1 @zhwesky2010 请帮忙解答一下,谢谢~

其他补充信息 Additional Supplementary Information

No response

@YibinLiu666
Copy link
Contributor

你是用的AMD显卡吗

@EmmonsCurse
Copy link
Author

EmmonsCurse commented Jan 9, 2024

你是用的AMD显卡吗

@YibinLiu666 使用到的编译机器是一台 CPU 配置的 Linux 机器 (Linux version 5.10.0-1.0.0.30 (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8))

@YibinLiu666
Copy link
Contributor

你是用的AMD显卡吗

@YibinLiu666 使用到的编译机器是一台 CPU 配置的 Linux 机器 (Linux version 5.10.0-1.0.0.30 (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8))

这个会有区别的,如果是amd gpu那么用cuda函数就会识别不到,这个我后续会解决一下,如果没有gpu那编译的话应该不用加with gpu吧

@EmmonsCurse
Copy link
Author

这个会有区别的,如果是amd gpu那么用cuda函数就会识别不到,这个我后续会解决一下,如果没有gpu那编译的话应该不用加with gpu吧

首先,这个编出来的预测库是用于在配置了 GPU 或者 XPU 的机器上使用,并不是在编译的机器上使用,其次,在手动指定了编译架构(-DCUDA_ARCH_NAME=Manual '-DCUDA_ARCH_BIN=75 80 86')后,编译机器本身哪怕没有 GPU,对编出来的包也无影响,还有 cuda、cudnn 等信息也是通过路径来指定的,这些信息可以详见给出的 cmake 命令

@heavyrain-lzy
Copy link
Contributor

cudaMallocAsync有cuda最低版本要求,应该需要需要>=11.2,参见:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#id65
image
所以相关PR需要有cuda版本检查#60551

@paddle-bot paddle-bot bot removed the status/new-issue 新建 label Jan 10, 2024
@zhwesky2010
Copy link
Contributor

@YibinLiu666 关于临时显存buffer分配,改成这样的实现吧,使用框架中已经封装好的接口,稳定性更高,没有引入第三方接口的风险 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h#L342-L358

@YibinLiu666
Copy link
Contributor

@YibinLiu666 关于临时显存buffer分配,改成这样的实现吧,使用框架中已经封装好的接口,稳定性更高,没有引入第三方接口的风险 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h#L342-L358

好的

@paddle-bot paddle-bot bot added status/developing 开发中 and removed status/following-up 跟进中 labels Jan 10, 2024
@EmmonsCurse
Copy link
Author

@YibinLiu666 关于临时显存buffer分配,改成这样的实现吧,使用框架中已经封装好的接口,稳定性更高,没有引入第三方接口的风险 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h#L342-L358

好的

@YibinLiu666 请问反馈的问题有了对应的解决办法吗?

@danleifeng
Copy link
Contributor

@YibinLiu666 辛苦注意下cudaMallocAsync 和 cudaFreeAsync都check下cuda版本或者替换下封装接口吧,我在cuda11.0下编译会报错,这俩函数均为cuda11.2后引入的。

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Jan 18, 2024

@YibinLiu666 #60934

@EmmonsCurse
Copy link
Author

@YibinLiu666 #60934

@zhwesky2010 请问这个 PR 何时可以合入?

@paddle-bot paddle-bot bot added status/testing and removed status/developing 开发中 labels Jan 23, 2024
@paddle-bot paddle-bot bot added status/close 已关闭 and removed status/testing labels Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants