Welcome to Generative AI on ARM, a hands-on course designed to help you optimize generative AI workloads on ARM architectures. Through practical labs and structured lectures, you will learn how to deploy AI models efficiently across different ARM-based environments.
This course consists of three hands-on labs and four lectures.
- Lab 1: Optimizing generative AI on mobile devices, such as the Raspberry Pi 5.
- Lab 2: Deploying AI workloads on ARM-based cloud servers, including AWS Graviton.
- Lab 3: Comparing cloud vs. edge inference, analyzing challenges and trade-offs.
Inside the slides/
folder, you will find four lectures covering the key concepts and challenges in AI inference on ARM:
- Challenges Facing Cloud and Edge GenAI Inference – Understanding the limitations and constraints of AI inference in different environments.
- Generative AI Models – Exploring model architectures, training methodologies, and deployment considerations.
- ML Frameworks and Optimized Libraries – A deep dive into AI software stacks, including PyTorch, ONNX Runtime, and ARM-specific optimizations.
- Optimization for CPU Inference – Techniques such as quantization, pruning, and leveraging SIMD instructions for faster AI performance.
You will learn how to optimize AI inference using ARM-specific techniques such as SIMD (SVE, NEON) and low-bit quantization. The course covers practical strategies for running generative AI efficiently on mobile, edge, and cloud-based ARM platforms. You will also explore the trade-offs between cloud and edge deployment, gaining both theoretical knowledge and hands-on skills.
By the end of this course, you will have a strong foundation in deploying high-performance AI models on ARM hardware.
-
Run the setup script
Open a terminal in the project directory and execute the setup script:./setup.sh
-
Login to a Hugging face account
huggingface-cli login
-
Open the course material
The course material is provided as Jupyter notebooks. To access the content:source pi5_env/bin/activate jupyter lab
-
Follow the instructions provided in
lab1.ipynb
to complete the lab.
-
Launch an AWS EC2 instance
- Go to Amazon EC2 and create a new instance.
- Select key pair: Create a key for SSH connection (e.g.,
yourkey.pem
). - Choose an AMI: Use the
Ubuntu 22.04
AMI as the operating system. - Instance type: Select
m7g.xlarge
(Graviton-based instance with ARM Neoverse cores). - Storage: Add 32 GB of root storage.
-
Connect to the instance via SSH
Use the following command to establish an SSH connection (replace with your instance details):ssh -i "yourkey.pem" -L 8888:localhost:8888 ubuntu@<ec2-public-dns>
-
Clone the repository
Once connected to the instance, clone the repository:git clone https://github.com/OliverGrainge/Generative_AI_on_arm.git
-
Run the setup script
Change to the repository directory and run the setup script:cd Generative_AI_on_arm ./setup_graviton.sh
-
Activate the virtual environment and log in to Hugging Face
After the setup completes, activate the virtual environment:source graviton_env/bin/activate huggingface-cli login
(You will need to log in to Hugging Face to download the required large language model.)
-
Launch the lab
Start Jupyter Lab by running:jupyter lab
Copy the link provided in the terminal output, open it in your local browser, and follow the instructions in the notebooks.
- Follow the setup stpes for
lab1
on your local raspberry pi. - Follow the setup stpes for
lab2
on your raspberry pi, to create and connect to a cloud instance. - Open
lab3.ipynb
to find the instructions for completing the lab
- To complete this course you are required to have access to a Raspberry Pi-5, for the cloud sections, AWS can be utilised.
- For Lab 2 and 3 make sure to terminate the EC2 instance when you're done to avoid unnecessary charges.
Happy learning!
Note: The primary content writer for this course is an AI researcher, Oliver Grainge.