Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: don't install nvidia drivers if nvidia-device-plugin is disabled #4358

Merged
merged 5 commits into from
Apr 12, 2021

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Apr 7, 2021

Reason for Change:

This PR makes a change so that nvidia drivers are not installed if the nvidia-device-plugin addon is disabled.

Additionally, because the known-working drivers that CSE installs don't work with containerd, we don't change the addons default flow so that the nvidia-device-plugin addon is disabled if containerd is the CRI.

The practical outcome of this is that you may use containerd with N series VM SKUs, and no nvidia drivers will be installed. The nvidia gpu-operator implementation solves this, see:

https://developer.nvidia.com/blog/announcing-containerd-support-for-the-nvidia-gpu-operator/

A new E2E scenario has been added for N series + containerd configurations, which installs the nvidia-curated gpu-operator helm chart, and then validates using the existing CUDA job.

Issue Fixed:

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Apr 7, 2021

Codecov Report

Merging #4358 (63a1407) into master (706dedb) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4358      +/-   ##
==========================================
+ Coverage   72.07%   72.09%   +0.02%     
==========================================
  Files         141      141              
  Lines       21640    21665      +25     
==========================================
+ Hits        15596    15619      +23     
- Misses       5093     5094       +1     
- Partials      951      952       +1     
Impacted Files Coverage Δ
pkg/engine/templates_generated.go 43.56% <ø> (ø)
pkg/api/addons.go 98.04% <100.00%> (ø)
pkg/engine/template_generator.go 68.54% <100.00%> (+0.33%) ⬆️
pkg/api/defaults.go 93.15% <0.00%> (-0.30%) ⬇️
pkg/api/types.go 92.85% <0.00%> (ø)
pkg/api/vlabs/types.go 73.04% <0.00%> (ø)
pkg/api/converterfromapi.go 95.70% <0.00%> (+<0.01%) ⬆️
pkg/api/convertertoapi.go 94.02% <0.00%> (+0.01%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 706dedb...63a1407. Read the comment docs.

mboersma
mboersma previously approved these changes Apr 7, 2021
Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot
Copy link

acs-bot commented Apr 7, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Collaborator

@Michael-Sinz Michael-Sinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be some documentation about the fact that the GPU Driver is not installed if in containerd and that it is up to the customer to install the nVidia GPU Operator.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants