Linux Operating System Bootable containers enabled for AI Training

In order to run accelerated AI workloads, we've prepared bootc container images for the major AI platforms.

Makefile targets

Target	Description
amd	Create bootable container for AMD platform
deepspeed	DeepSpeed container for optimization deep learning
disk-amd	Create disk image from bootable container for AMD platform
disk-intel	Create disk image from bootable container for Intel platform
disk-nvidia	Create disk image from bootable container for Nvidia platform
instruct-amd	Create instruct lab image for bootable container for AMD platform
instruct-intel	Create instruct lab image for bootable container for Intel platform
instruct-nvidia	Create instruct lab image for bootable container for Nvidia platform
intel	Create bootable container for Intel Habanalabs platform
nvidia	Create bootable container for NVidia platform
vllm	Containerized inference/serving engine for LLMs

Makefile variables

Variable	Description	Default
FROM	Overrides the base image for the Containerfiles	`quay.io/centos-bootc/centos-bootc:stream9`
REGISTRY	Container Registry for storing container images	`quay.io`
REGISTRY_ORG	Container Registry organization	`ai-lab`
IMAGE_NAME	Container image name	platform (i.e. `amd`)
IMAGE_TAG	Container image tag	`latest`
CONTAINER_TOOL	Container tool used for build	`podman`
CONTAINER_TOOL_EXTRA_ARGS	Container tool extra arguments
VENDOR	Container image vendor label

Note: AI content is huge and requires a lot of disk space >200GB free to build.

How to build InstructLab containers

In order to do AI Training you need to build instructlab container images.

Simply execute make instruct-<platform>. For example:

make instruct-amd
make instruct-intel
make instruct-nvidia

Once you have these container images built it is time to build vllm.

How to build the vllm inference engine

make vllm

How to build the deepspeed deepspeed container

make deepspeed

How to build bootc container images

In order to build the images (by default based on CentOS Stream), a simple make <platform> should be enough. For example to build the nvidia, amd and intel bootc containers, respectively:

make nvidia
make amd
make intel

How to build bootc container images based on Red Hat Enterprise Linux

In order to build the training images based on Red Hat Enterprise Linux bootc images, the appropriate base container image must be used in the FROM field and the build process must be run on an entitled Red Hat 9.x Enterprise Linux with a valid subscription.

For example:

make nvidia FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
make amd FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
make intel FROM=registry.redhat.io/rhel9/rhel-bootc:9.4

Of course, the other Makefile variables are still available, so the following is a valid build command:

make nvidia REGISTRY=myregistry.com REGISTRY_ORG=ai-training IMAGE_NAME=nvidia IMAGE_TAG=v1 FROM=registry.redhat.io/rhel9/rhel-bootc:9.4

How to build disk images

bootc-image-builder produces disk images using a bootable container as input. Disk images can be used to directly provision a host The process will write the disk image in -bootc/build

IMPORTANT: osbuild-selinux package needs to be installed for bootc-image-builder to work in a SELinux enabled host

To invoke bootc-image-builder, execute make disk-

make disk-nvidia

or

make disk-nvidia DISK_TYPE=ami BOOTC_IMAGE=quay.io/ai-lab/nvidia-bootc-custom:latest

In addition to the variables common to all targets, a few extra can be defined to customize disk image creation

Variable	Description	Default
BOOTC_IMAGE	Image to use as input	`$REGISTRY/$REGISTRY_ORG/$IMAGE_NAME:$IMAGE_TAG`
DISK_TYPE	Type of image to build	`qcow2`
IMAGE_BUILDER_CONFIG	Path to a build-config file	`EMPTY`

Image builder config file is documented in bootc-image-builder README

The following image disk types are currently available:

Disk type	Target environment
`ami`	Amazon Machine Image
`qcow2` (default)	QEMU
`vmdk`	VMDK usable in vSphere, among others
`anaconda-iso`	An unattended Anaconda installer that installs to the first disk found.
`raw`	Unformatted raw disk.

Images customized for cloud providers

For building images customized for each supported cloud provider, please read the cloud providers section

Troubleshooting

Sometimes, interrupting the build process may lead to wanting a complete restart of the process. For those cases, we can instruct podman to start from scratch and discard the cached layers. This is possible by passing the --no-cache parameter to the build process by using the CONTAINER_TOOL_EXTRA_ARGS variable:

make <platform> CONTAINER_TOOL_EXTRA_ARGS="--no-cache"

The building of accelerated images requires a lot of temporary disk space. In case you need to specify a directory for temporary storage, this can be done with the TMPDIR environment variable:

make <platform> TMPDIR=/path/to/tmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Linux Operating System Bootable containers enabled for AI Training

Makefile targets

Makefile variables

How to build InstructLab containers

How to build the vllm inference engine

How to build the deepspeed deepspeed container

How to build bootc container images

How to build bootc container images based on Red Hat Enterprise Linux

How to build disk images

Images customized for cloud providers

Troubleshooting

Files

README.md

Latest commit

History

README.md

File metadata and controls

Linux Operating System Bootable containers enabled for AI Training

Makefile targets

Makefile variables

How to build InstructLab containers

How to build the vllm inference engine

How to build the deepspeed deepspeed container

How to build bootc container images

How to build bootc container images based on Red Hat Enterprise Linux

How to build disk images

Images customized for cloud providers

Troubleshooting