-
Notifications
You must be signed in to change notification settings - Fork 6.8k
CD Release Pipeline libmxnet.so Symbol Error #19917
Comments
Yeah, this comes from the fact that the CUDA 11.2 image has 11.2 version of nvml.h, while the actual libnvml library is a part of the driver (and the driver on that machine is probably older and does not have that version of the function). If you look at the nvml.h version history here: https://github.com/NVIDIA/nvidia-settings/blob/master/src/nvml.h - the As a workaround you can just add the option |
The other approach (I believe the recommended one) is to use dlopen to load the nvml library at runtime, so that those additional symbols from nvml.h and the libnvidia-ml stub library in the build image do not contaminate the resulting binary (since that function is not even used by mxnet). |
@ptrendx Thanks for the info!I believe we use 450.51.05 while cu111 112 require 450.80.02 according to https://docs.nvidia.com/deploy/cuda-compatibility/index.html. I think the best way might be to update the nvidia driver on the gpu machines. This error only happens in cd probably because we do |
fixed by 19939 |
After #19870 master cd cu112 was able to build. However we have this symbol error in the test stage now
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/2523/pipeline/401
The text was updated successfully, but these errors were encountered: