-
Notifications
You must be signed in to change notification settings - Fork 522
chore: Upgrade nvidia drivers to 450.80.02 #4376
Conversation
💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. Examples of commit messages with semantic prefixes: - |
Thanks @gdippolito, is there a newer version of basic CUDA validation we can run in E2E that you know of? We're still running the v0.1 image from 4+ years ago, see: |
Thanks for your comment @jackfrancis. I have updated the e2e test with a more recent image which uses CUDA 11.1. I have done few tests locally and this sample image does not seem to work with the version of the drivers I picked (it works on 460.x but not on 450.80.02). I'm a bit confused since according to the Nvidia documentation this is expected to work. |
@gdippolito thanks! I've tested your new image, and the E2E tests pass fine. So I think we're good here. cc @Michael-Sinz @jadarsie are there any concerns with the following:
To be clear, these driver changes are specific to docker-backed clusters. If you're using containerd you can't use these drivers, and are probably using the gpu-operator already if you've figured out how to make this work with aks-engine-created clusters. |
FYI @penggu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gdippolito, jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report
@@ Coverage Diff @@
## master #4376 +/- ##
=======================================
Coverage 72.04% 72.04%
=======================================
Files 141 141
Lines 21631 21631
=======================================
Hits 15584 15584
Misses 5096 5096
Partials 951 951
Continue to review full report at Codecov.
|
Looks good to me. |
Shall we 🚢 it? |
Congrats on merging your first pull request! 🎉🎉🎉 |
Reason for Change:
Upgrade the Nvidia drivers for GPU machine to 450.80.02.
This version of the drivers is the same used in AzureML based images since April 2021. This change will add compatibility with newer version of CUDA (11.1,11.2,11.3). Reference https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
Issue Fixed:
None. It adds support to run docker images based on CUDA >= 11.1
Credit Where Due:
Does this change contain code from or inspired by another project?
If "Yes," did you notify that project's maintainers and provide attribution?
Requirements:
Notes:
Should be pretty safe since it is a minor version upgrade.