generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 394
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Enable ability to build GPU drives during image build
This addition also creates a new s3 addtional_component that can be used for other s3 related interactions. NVIDIA drivers can be optionally installed using the added role. Due to NVIDIA not making the drivers for GRIDD publically available, this role requires an S3 endpoint as it is probably the most available to most users. Users can use a variety of tools to create an S3 Endpoint be it AWS, CloudFlare, Minio or one of the many other options. With this in mind, this option seems the most logical, plus it allows for an endpoint that can be secured thus not breaking any license agreement with NVIDIA with regards to making the driver public. Users should store their .run driver file and .tok file on the S3 endpoint. the gridd.conf will be generated based on the Feature flag passed in.
- Loading branch information
1 parent
c46443b
commit 5a25506
Showing
9 changed files
with
237 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,3 +21,6 @@ | |
- import_tasks: url.yml | ||
when: additional_url_images | bool | ||
|
||
- import_tasks: s3.yml | ||
when: additional_s3 | bool | ||
|
24 changes: 24 additions & 0 deletions
24
images/capi/ansible/roles/load_additional_components/tasks/s3.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Copyright 2023 The Kubernetes Authors. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
--- | ||
- name: Download additional from S3 | ||
amazon.aws.s3_object: | ||
endpoint_url: "{{ additional_s3_endpoint }}" | ||
access_key: "{{ additional_s3_access }}" | ||
secret_key: "{{ additional_s3_secret }}" | ||
bucket: "{{ additional_s3_bucket }}" | ||
object: "{{ additional_s3_object }}" | ||
dest: "{{ additional_s3_destination_path }}" | ||
mode: get | ||
ceph: "{{ additional_s3_ceph }}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# NVIDIA GPU driver installation | ||
|
||
To install the NVIDIA GPU driver as part of the image build process, you must have a `.run` file and `.tok` file from NVIDIA ready and available from an S3 endpoint. | ||
|
||
Then all you need to do is reference those files in your packer file. | ||
|
||
An example of the fields you need are defined below. Make sure to review and change any fields where required. | ||
|
||
```json | ||
{ | ||
"ansible_user_vars": "nvidia_s3_url=https://s3-endpoint nvidia_bucket=nvidia nvidia_bucket_access=ACCESS_KEY nvidia_bucket_secret=SECRET_KEY nvidia_installer_location=NVIDIA-Linux-x86_64-525.85.05-grid.run nvidia_tok_location=client_configuration_token.tok gridd_feature_type=4" | ||
"node_custom_roles_pre": "nvidia" | ||
} | ||
|
||
``` | ||
|
||
The role has to be installed via the `node_custom_roles_pre` option to avoid a known issue where should a dist-upgrade install a new kernel, | ||
the driver won't work with it when the image is booted. This is because the DKMS hook doesn't get run due to the driver | ||
being installed after the kernel has been installed. To get around this, we install the driver first. | ||
|
||
The `nvidia` custom role makes use of the `s3->load_additional_components` role so that it can fetch the items required from an S3 endpoint. | ||
|
||
The reasoning behind requiring an S3 endpoint was due to the fact NVIDIA will soon (July 2023) no longer support an internal licensing server being hosted by a customer. | ||
|
||
As a result they now require a `.tok` file to be available for licensing via their cloud services. | ||
This file contains sensitive information and is unique to the company/license to which it is provided. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# Copyright 2023 The Kubernetes Authors. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
--- | ||
|
||
- name: unload nouveau | ||
modprobe: | ||
name: nouveau | ||
state: absent | ||
ignore_errors: true | ||
|
||
- name: Add NVIDIA package signing key | ||
ansible.builtin.apt_key: | ||
url: https://nvidia.github.io/libnvidia-container/gpgkey | ||
when: ansible_distribution == "Debian" | ||
|
||
- name: perform a cache update | ||
apt: | ||
force_apt_get: True | ||
update_cache: True | ||
register: apt_lock_status | ||
until: apt_lock_status is not failed | ||
retries: 5 | ||
delay: 10 | ||
when: ansible_distribution == "Debian" | ||
|
||
- name: Install packages for interacting with s3 endpoint & building NVIDIA driver kernel module | ||
become: true | ||
ansible.builtin.apt: | ||
pkg: | ||
- python3-boto3 | ||
- python3-botocore | ||
- build-essential | ||
- wget | ||
- dkms | ||
when: ansible_distribution == "Debian" | ||
|
||
- name: Make /etc/nvidia/ClientConfigToken directory | ||
become: true | ||
file: | ||
path: /etc/nvidia/ClientConfigToken | ||
state: directory | ||
owner: root | ||
group: root | ||
mode: 0755 | ||
|
||
- name: Download NVIDIA License Token | ||
ansible.builtin.include_role: | ||
name: load_additional_components | ||
vars: | ||
additional_s3: true | ||
additional_s3_endpoint: "{{ nvidia_s3_url }}" | ||
additional_s3_access: "{{ nvidia_bucket_access }}" | ||
additional_s3_secret: "{{ nvidia_bucket_secret }}" | ||
additional_s3_bucket: "{{ nvidia_bucket }}" | ||
additional_s3_ceph: "{{ nvidia_ceph }}" | ||
additional_s3_object: "{{ nvidia_tok_location }}" | ||
additional_s3_destination_path: /etc/nvidia/ClientConfigToken/client_configuration_token.tok | ||
|
||
- name: Set Permissions of NVIDIA License Token | ||
file: | ||
path: /etc/nvidia/ClientConfigToken/client_configuration_token.tok | ||
state: file | ||
owner: root | ||
group: root | ||
mode: 0744 | ||
|
||
- name: Create GRIDD licensing config | ||
become: true | ||
template: | ||
src: templates/gridd.conf.j2 | ||
dest: /etc/nvidia/gridd.conf | ||
mode: 0644 | ||
|
||
- name: Download NVIDIA driver | ||
ansible.builtin.include_role: | ||
name: load_additional_components | ||
vars: | ||
additional_s3: true | ||
additional_s3_endpoint: "{{ nvidia_s3_url }}" | ||
additional_s3_access: "{{ nvidia_bucket_access }}" | ||
additional_s3_secret: "{{ nvidia_bucket_secret }}" | ||
additional_s3_bucket: "{{ nvidia_bucket }}" | ||
additional_s3_ceph: "{{ nvidia_ceph }}" | ||
additional_s3_object: "{{ nvidia_installer_location }}" | ||
additional_s3_destination_path: /tmp/NVIDIA-Linux-gridd.run | ||
|
||
- name: Set Permissions of NVIDIA driver | ||
file: | ||
path: /tmp/NVIDIA-Linux-gridd.run | ||
state: file | ||
owner: root | ||
group: root | ||
mode: 0755 | ||
|
||
- name: Install NVIDIA driver | ||
become: true | ||
ansible.builtin.command: | ||
cmd: "/tmp/NVIDIA-Linux-gridd.run -s --dkms --no-cc-version-check" | ||
|
||
- name: Cleanup packages for interacting with s3 endpoint | ||
become: true | ||
ansible.builtin.apt: | ||
state: absent | ||
purge: true | ||
pkg: | ||
- python3-boto3 | ||
- python3-botocore | ||
when: ansible_distribution == "Debian" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Copyright 2023 The Kubernetes Authors. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
FeatureType={{ gridd_feature_type }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Copyright 2023 The Kubernetes Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
set -o errexit | ||
set -o nounset | ||
set -o pipefail | ||
|
||
[[ -n ${DEBUG:-} ]] && set -o xtrace | ||
|
||
source hack/utils.sh | ||
|
||
# Change directories to the parent directory of the one in which this | ||
# script is located. | ||
cd "$(dirname "${BASH_SOURCE[0]}")/.." | ||
|
||
# Disable pip's version check and root user warning | ||
export PIP_DISABLE_PIP_VERSION_CHECK=1 PIP_ROOT_USER_ACTION=ignore | ||
|
||
# S3 interaction requires the following galaxy collection | ||
ansible-galaxy collection install amazon.aws |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters