This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
CentOS GPU tests failing in master #16951
Comments
@mxnet-label-bot add [CI] |
For more info, I've made a change to print the cublaserror's error message out:
The error type is |
I think the error doesn't say much. I think the issue is the driver inside the docker image causes problems, at least I saw nvidia engineers acknowledging such an issue. If you see one of my PRs the failure goes away but some jobs require cuda libs in the container. |
Fixed by #16968 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
Centos GPU tests are failing in master:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/master/1341/
I couldn't reproduce in p3 instance over ubuntu 18.04. Trying in the CI AMI now.
Seems to be a problem in the base AMI, reproduced by running the following commands:
Failure is:
A solution would be to update the AMI
The text was updated successfully, but these errors were encountered: