Before relocating to Switzerland, I faced the challenge of determining how to manage the extensive IT equipment I had accumulated over years in the tech industry. My goal was to consolidate my homelab into portable yet powerful and durable devices capable of handling workloads similar to those in my original setup.
After thorough consideration, I opted for two fully maxed-out ThinkPad P13 Gen 1 laptops. These systems replaced a homelab environment that previously consisted of multiple servers running a wide range of self-hosted applications, including virtualized firewalls, WordPress servers, Kubernetes clusters, and more.
Transitioning from a highly virtualized TrueNAS system operating within a hyper-converged infrastructure on vSphere to a pair of laptops proved to be a significant challenge. The shift became even more complex as I decided to delve fully into the realm of AI and large language model (LLM) training. However, with two laptops equipped with powerful GPUs that were underutilized, I developed a robust infrastructure tailored to my LLM training requirements.
Despite the initial difficulties, the results have been extraordinary. The streamlined setup not only meets but exceeds my expectations, demonstrating the capabilities of modern portable hardware in supporting high-performance workloads. The solution was to combine the power of my two laptops with powerful GPUs ( NVIDIA RTX 5000s ) to use them as a cluster, while leveraging Docker, below is the steps to accomplish that:
-
Open PowerShell as Administrator and enable WSL: powershell wsl --install
-
Set Ubuntu as the default distribution (if not already installed): powershell wsl --set-default Ubuntu
-
Install Docker Desktop
- Download and install Docker Desktop from docker.com.
- In Docker Desktop settings, enable WSL 2 integration for Ubuntu.
-
Install NVIDIA Drivers and Toolkit
-
Ensure the latest NVIDIA drivers are installed.
-
Follow the NVIDIA Container Toolkit for WSL setup guide:
sudo apt update sudo apt install -y nvidia-container-toolkit sudo systemctl restart docker
-
-
Install OpenMPI: bash sudo apt update sudo apt install -y openmpi-bin openmpi-common libopenmpi-dev
-
Create a Docker Image with OpenMPI and ML Framework
- Create a
Dockerfile
:FROM nvidia/cuda:12.2.0-base RUN apt-get update && apt-get install -y \ python3 \ python3-pip \ openmpi-bin \ libopenmpi-dev \ && rm -rf /var/lib/apt/lists/* RUN pip3 install tensorflow torch CMD ["/bin/bash"]
- Build the image:
docker build -t ml-cluster .
- Create a
-
Initialize Docker Swarm on Laptop 1:
docker swarm init
- Note the join token provided.
-
Join Laptop 2 to the Swarm:
docker swarm join --token <your-swarm-token> <manager-ip>:2377
-
Prepare a Distributed Training Script (e.g., TensorFlow):
import tensorflow as tf import horovod.tensorflow as hvd hvd.init() gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], 'GPU') # Example: Simple MNIST training (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() model = tf.keras.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5)
-
Run the Script Across Both Laptops:
mpirun -np 2 -H laptop1-ip:1,laptop2-ip:1 python3 train.py
- Use
nvidia-smi
to monitor GPU utilization:watch -n 1 nvidia-smi
- Optimize batch sizes and communication overhead as needed.
This setup allowed me to harness the combined GPU power of your two laptops for machine learning tasks.