This repository is an implementation of quantizing and converting the Llama3-8B-Instruct model weights and deploying it on Android for on-device inference.
- Colab notebook to quantize and convert Llama3-8B-Instruct model
- HuggingFace repository for Llama3-8B-Instruct converted weights.
- Medium blog for step-by-step implementation to deploy Llama-3-8B-Instruct on Android.
- Medium blog to set up environment on Google Cloud Platform VM instance.
- Install the APK directly.
@software{mlc-llm,
author = {MLC team},
title = {{MLC-LLM}},
url = {https://github.com/mlc-ai/mlc-llm},
year = {2023}
}