-
Notifications
You must be signed in to change notification settings - Fork 6.8k
what's wrong with PR test? #19373
Comments
It's a CI issue and need to be fixed. It's unrelated to your PR. |
I think there's two issues today. First, I found an expired GPG key that was preventing R packages from being installed on Ubuntu 16.04. I created PR #19377 for this. Waiting for my PR to pass to backport. Second, I see errors trying to uncompress the Packages file from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/ It looks like a new file was recently pushed, so maybe this has been fixed by nvidia. |
@josephevans thank you for looking into the issue. But please note that the issues you mention are unrelated as they only affect the v1.x branch, whereas the issue described here affects the master branch. |
The nvidia issue may affect more than the 16.04 mentioned above. NVIDIA/nvidia-docker#1402 contains some more info (though the issue was closed as it isn't directed to the correct owner) |
Ok, I believe I finally found the culprit. Our AMIs that are used for Jenkins slaves have auto-update turned on, and based on the logfiles of the slave instances, it looks like docker was being auto-updated and restarted, which was killing the log-output of the containers (and therefore jenkins jobs.) I've created a new AMI for mxnetlinux_cpu hosts with updated software versions, which also adds an option to the docker config to hopefully prevent this in the future. See https://docs.docker.com/config/containers/live-restore/ - Thanks @leezu for the recommendation. |
CI seems much more stable today with the new AMI. Released an updated AMI to address ARMv8 test failures due to qemu installation. We should no longer be seeing the docker connection issues (or unexpected EOF errors) and 2 of 3 PRs to fix the other CI issue (expired GPG key) has been merged. This issue can be closed. |
Thank you so much Joe. |
Seems the CI works now, close this issue. |
currently, many PR failed due to bad network connection.
not only me myself, but also many PR is affected.
such exception could not resolved by any PR.
I re-run test for at least 5 times, and most of the test failed.
what's wrong?
The text was updated successfully, but these errors were encountered: