From 663b2c6bcf780ec161d802901b2329336e4e65ac Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Wed, 15 Oct 2025 08:56:20 -0700 Subject: [PATCH] Add RaspberryPi Tutorials to deploy & infer llama model (#15109) ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. (cherry picked from commit 312267e7a3b8e00a0f4fa9c525f2dbb804fc4991) --- docs/source/desktop-section.md | 5 + docs/source/edge-platforms-section.md | 1 + docs/source/embedded-section.md | 3 +- docs/source/raspberry_pi_llama_tutorial.md | 394 +++++++++++++++++++++ 4 files changed, 402 insertions(+), 1 deletion(-) create mode 100644 docs/source/raspberry_pi_llama_tutorial.md diff --git a/docs/source/desktop-section.md b/docs/source/desktop-section.md index 7afccbe1d4f..bf306e7c43b 100644 --- a/docs/source/desktop-section.md +++ b/docs/source/desktop-section.md @@ -12,8 +12,13 @@ Deploy ExecuTorch on Linux, macOS, and Windows with optimized backends for each - {doc}`desktop-backends` — Available desktop backends and platform-specific optimization +## Tutorials + +- {doc}`raspberry_pi_llama_tutorial` — Cross compiling ExecuTorch for the Raspberry Pi on Linux Host + ```{toctree} :hidden: using-executorch-cpp using-executorch-building-from-source desktop-backends +raspberry_pi_llama_tutorial diff --git a/docs/source/edge-platforms-section.md b/docs/source/edge-platforms-section.md index 99e44093544..2b9ee2131de 100644 --- a/docs/source/edge-platforms-section.md +++ b/docs/source/edge-platforms-section.md @@ -12,6 +12,7 @@ Deploy ExecuTorch on Android devices with hardware acceleration support. **→ {doc}`android-section` — Complete Android deployment guide** Key features: + - Hardware acceleration support (CPU, GPU, NPU) - Multiple backend options (XNNPACK, Vulkan, Qualcomm, MediaTek, ARM, Samsung) - Comprehensive examples and demos diff --git a/docs/source/embedded-section.md b/docs/source/embedded-section.md index 834001afbc3..5636a7546dc 100644 --- a/docs/source/embedded-section.md +++ b/docs/source/embedded-section.md @@ -25,7 +25,7 @@ Start here for C++ development with ExecuTorch runtime APIs and essential tutori ## Tutorials - {doc}`tutorial-arm-ethos-u` — Export a simple PyTorch model for the ExecuTorch Ethos-U backend - +- {doc}`raspberry_pi_llama_tutorial` — Deploy a LLaMA model on a Raspberry Pi ```{toctree} :hidden: @@ -37,3 +37,4 @@ using-executorch-cpp using-executorch-building-from-source embedded-backends tutorial-arm-ethos-u +raspberry_pi_llama_tutorial diff --git a/docs/source/raspberry_pi_llama_tutorial.md b/docs/source/raspberry_pi_llama_tutorial.md new file mode 100644 index 00000000000..e37bbb61c06 --- /dev/null +++ b/docs/source/raspberry_pi_llama_tutorial.md @@ -0,0 +1,394 @@ +# ExecuTorch on Raspberry Pi + +## TLDR + +This tutorial demonstrates how to deploy **Llama models on Raspberry Pi 4/5 devices** using ExecuTorch: + +- **Prerequisites**: Linux host machine, Python 3.10-3.12, conda environment, Raspberry Pi 4/5 +- **Setup**: Automated cross-compilation using `setup.sh` script for ARM toolchain installation +- **Export**: Convert Llama models to optimized `.pte` format with quantization options +- **Deploy**: Transfer binaries to Raspberry Pi and configure runtime libraries +- **Optimize**: Build optimization and performance tuning techniques +- **Result**: Efficient on-device Llama inference + +## Prerequisites and Hardware Requirements + +### Host Machine Requirements + +**Operating System**: Linux x86_64 (Ubuntu 20.04+ or CentOS Stream 9+) + +**Software Dependencies**: + +- **Python 3.10-3.12** (ExecuTorch requirement) +- **conda** or **venv** for environment management +- **CMake 3.29.6+** +- **Git** for repository cloning + +### Target Device Requirements + +**Supported Devices**: **Raspberry Pi 4** and **Raspberry Pi 5** with **64-bit OS** + +**Memory Requirements**: + +- **RAM & Storage** Varies by model size and optimization level +- **64-bit Raspberry Pi OS** (Bullseye or newer) + +### Verification Commands + +Verify your host machine compatibility: +```bash +# Check OS and architecture +uname -s # Should output: Linux +uname -m # Should output: x86_64 + +# Check Python version +python3 --version # Should be 3.10-3.12 + +# Check required tools +hash cmake git md5sum 2>/dev/null || echo "Missing required tools" + +cmake --version # Should be 3.29.6+ at minimum + +## Development Environment Setup + +### Clone ExecuTorch Repository + +First, clone the ExecuTorch repository with the Raspberry Pi support: + +```bash +# Create project directory +mkdir ~/executorch-rpi && cd ~/executorch-rpi && git clone -b release/1.0 https://github.com/pytorch/executorch.git && +cd executorch +``` + +### Create Conda Environment + +```bash +# Create conda environment +conda create -yn executorch python=3.10.0 +conda activate executorch + +# Upgrade pip +pip install --upgrade pip +``` + +Alternative: Virtual Environment +If you prefer Python's built-in virtual environment: + +```bash +python3 -m venv .venv +source .venv/bin/activate +pip install --upgrade pip +``` + +Refer to → {doc}`getting-started` for more details. + +## Cross-Compilation Toolchain Step + +Run the following script on your Linux host machine: + +```bash +# Run the Raspberry Pi setup script for Pi 5 +examples/raspberry_pi/setup.sh pi5 +``` + +On successful completion, you should see the following output: + +```bash +[100%] Linking CXX executable llama_main +[100%] Built target llama_main +[SUCCESS] LLaMA runner built successfully + +==== Verifying Build Outputs ==== +[SUCCESS] ✓ llama_main (6.1M) +[SUCCESS] ✓ libllama_runner.so (4.0M) +[SUCCESS] ✓ libextension_module.a (89K) - static library + +✓ ExecuTorch cross-compilation setup completed successfully! +``` + +## Model Preparation and Export + +### Download Llama Models + +Download the Llama model from Hugging Face or any other source, and make sure that following files exist. + +- consolidated.00.pth (model weights) +- params.json (model config) +- tokenizer.model (tokenizer) + +### Export Llama to ExecuTorch Format + +After downloading the Llama model, export it to ExecuTorch format using the provided script: + +```bash + +#### Set these paths to point to the exported files. Following is an example instruction to export a llama model + +LLAMA_QUANTIZED_CHECKPOINT=path/to/consolidated.00.pth +LLAMA_PARAMS=path/to/params.json + +python -m extension.llm.export.export_llm \ + --config examples/models/llama/config/llama_xnnpack_spinquant.yaml \ + +base.model_class="llama3_2" \ + +base.checkpoint="${LLAMA_QUANTIZED_CHECKPOINT:?}" \ + +base.params="${LLAMA_PARAMS:?}" +``` + +The file llama3_2.pte will be generated at the place where you run the command + +- For more details see [Option A: Download and Export Llama3.2 1B/3B Model](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#option-a-download-and-export-llama32-1b3b-model) +- Also refer to → {doc}`llm/export-llm` for more details. + +## Raspberry Pi Deployment + +### Transfer Binaries to Raspberry Pi + +After successful cross-compilation, transfer the required files: + +```bash +##### Set Raspberry Pi details +export RPI_UN="pi" # Your Raspberry Pi username +export RPI_IP="your-rpi-ip-address" + +##### Create deployment directory on Raspberry Pi +ssh $RPI_UN@$RPI_IP 'mkdir -p ~/executorch-deployment' +##### Copy main executable +scp cmake-out/examples/models/llama/llama_main $RPI_UN@$RPI_IP:~/executorch-deployment/ +##### Copy runtime library +scp cmake-out/examples/models/llama/runner/libllama_runner.so $RPI_UN@$RPI_IP:~/executorch-deployment/ +##### Copy model file +scp llama3_2.pte $RPI_UN@$RPI_IP:~/executorch-deployment/ +scp ./tokenizer.model $RPI_UN@$RPI_IP:~/executorch-deployment/ +``` + +### Configure Runtime Libraries on Raspberry Pi + +SSH into your Raspberry Pi and configure the runtime: + +#### Set up library environment + +```bash +cd ~/executorch-deployment +echo 'export LD_LIBRARY_PATH=$(pwd):$LD_LIBRARY_PATH' > setup_env.sh +chmod +x setup_env.sh + +#### Make executable + +chmod +x llama_main +``` + +## Dry Run + +```bash +source setup_env.sh +./llama_main --help +``` + +Make sure that the output does not have any GLIBC / other library mismatch errors in the output. If you see any, follow the troubleshooting steps below. + +## Troubleshooting + +### Issue 1: GLIBC Version Mismatch + +**Problem:** The binary was compiled with a newer GLIBC version (2.38) than what's available on your Raspberry Pi (2.36). + +**Error Symptoms:** + +```bash +./llama_main: /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by ./llama_main) +./llama_main: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ./llama_main) +./llama_main: /lib/aarch64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./llama_main) +./llama_main: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /lib/libllama_runner.so) +``` + +**There are two potential solutions:** + +- **Solution A**: Modify the Pi to match the binary (run on Pi) + +- **Solution B**: Modify the binary to match the Pi (run on host) + +#### Solution A: Upgrade GLIBC on Raspberry Pi (Recommended) + +1. **Check your current GLIBC version:** + +```bash +ldd --version +# Output: ldd (Debian GLIBC 2.36-9+rpt2+deb12u12) 2.36 +``` + +2. **⚠️ Compatibility Warning and Safety Check:** + +```bash +# Just check and warn - don't do the upgrade +current_glibc=$(ldd --version | head -n1 | grep -o '[0-9]\+\.[0-9]\+') +required_glibc="2.38" + +echo "Current GLIBC: $current_glibc" +echo "Required GLIBC: $required_glibc" + +if [[ $(echo "$current_glibc < $required_glibc" | bc -l) -eq 1 ]]; then + echo "" + echo "⚠️ WARNING: Your GLIBC version is too old" + echo " You need to upgrade to continue with the next steps" + echo " Consider using Solution B (rebuild binary) for better safety" + echo "" +else + echo "✅ Your GLIBC version is already compatible" +fi +``` + +**NOTE:** If the output shows "⚠️ WARNING: Your GLIBC version is too old", proceed with either Upgrade / Step #3 below (or) Solution B. Otherwise skip the next step as your device is __already compatible__ and directly go to Step#4. + +3. **Upgrade to newer GLIBC:** + +```bash +# Add Debian unstable repository +echo "deb http://deb.debian.org/debian sid main contrib non-free" | sudo tee -a /etc/apt/sources.list + +# Update package lists +sudo apt update + +# Install newer GLIBC packages +sudo apt-get -t sid install libc6 libstdc++6 + +# Reboot system +sudo reboot +``` + +4. **Verify compatibility after reboot:** + +```bash +cd ~/executorch-deployment +source setup_env.sh + +# Test that the binary works +if ./llama_main --help &>/dev/null; then + echo "✅ GLIBC upgrade successful - binary is compatible" +else + echo "❌ GLIBC upgrade failed - binary still incompatible" + echo "Consider rolling back or refer to documentation for troubleshooting" +fi +``` + +5. **Test the fix:** + +```bash +cd ~/executorch-deployment +source setup_env.sh +./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "Hello" +``` + +**Important Notes:** + +- Select "Yes" when prompted to restart services +- Press Enter to keep current version for configuration files +- Backup important data before upgrading + +#### Solution B: Rebuild with Raspberry Pi's GLIBC (Advanced) + +If you prefer not to upgrade your Raspberry Pi system: + +1. **Copy Pi's filesystem to host machine:** + +```bash +# On Raspberry Pi - install rsync +ssh pi@ +sudo apt update && sudo apt install rsync +exit + +# On host machine - copy Pi's filesystem +mkdir -p ~/rpi5-sysroot +rsync -aAXv --exclude={"/proc","/sys","/dev","/run","/tmp","/mnt","/media","/lost+found"} \ + pi@:/ ~/rpi5-sysroot +``` + +2. **Update CMake toolchain file:** +```bash +# Edit arm-toolchain-pi5.cmake +# Replace this line: +# set(CMAKE_SYSROOT "${TOOLCHAIN_PATH}/aarch64-none-linux-gnu/libc") + +# With this: +set(CMAKE_SYSROOT "/home/yourusername/rpi5-sysroot") +set(CMAKE_FIND_ROOT_PATH "${CMAKE_SYSROOT}") +``` + +3. **Rebuild binaries:** +```bash +# Clean and rebuild +rm -rf cmake-out +./examples/raspberry_pi/rpi_setup.sh pi5 --force-rebuild + +# Verify GLIBC version +strings ./cmake-out/examples/models/llama/llama_main | grep GLIBC_ +# Should show max GLIBC_2.36 (matching your Pi) +``` + +--- + +### Issue 2: Library Not Found + +**Problem:** Required libraries are not found at runtime. + +**Error Symptoms:** +```bash +./llama_main: error while loading shared libraries: libllama_runner.so: cannot open shared object file +``` + +**Solution:** +```bash +# Ensure you're in the correct directory and environment is set +cd ~/executorch-deployment +source setup_env.sh +./llama_main --help +``` + +**Root Cause:** Either `LD_LIBRARY_PATH` is not set or you're not in the deployment directory. + +--- + +### Issue 3: Tokenizer JSON Parsing Warnings + +**Problem:** Warning messages about JSON parsing errors after running the llama_main binary. + +**Error Symptoms:** + +```bash +E tokenizers:hf_tokenizer.cpp:60] Error parsing json file: [json.exception.parse_error.101] +``` + +**Solution:** These warnings can be safely ignored. They don't affect model inference. + +--- + + +## Quick Test Command + +After resolving issues, test with: + +```bash +cd ~/executorch-deployment +source setup_env.sh +./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "What is the meaning of life?" +``` + +## Debugging Tools + +Enable ExecuTorch logging: + +```bash +# Set log level for debugging +export ET_LOG_LEVEL=Info +./llama_main --model_path ./model.pte --verbose +``` + +## Final Run command + +```bash +cd ~/executorch-deployment +source setup_env.sh +./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "What is the meaning of life?" +``` + +Happy Inferencing!