Optimum ExecuTorch enables efficient deployment of transformer models using Meta's ExecuTorch framework. It provides:
- 🔄 Easy conversion of Hugging Face models to ExecuTorch format
- ⚡ Optimized inference with hardware-specific optimizations
- 🤝 Seamless integration with Hugging Face Transformers
- 📱 Efficient deployment on various devices
Install from source:
git clone https://github.com/huggingface/optimum-executorch.git
cd optimum-executorch
pip install .
- 🔜 Install from pypi coming soon...
There are two ways to use Optimum ExecuTorch:
Use the CLI tool to convert your model to ExecuTorch format:
optimum-cli export executorch \
--model "meta-llama/Llama-3.2-1B" \
--task "text-generation" \
--recipe "xnnpack" \
--output_dir="meta_llama3_2_1b"
Use the exported model for text generation:
from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer
# Load the exported model
model = ExecuTorchModelForCausalLM.from_pretrained(
"./meta_llama3_2_1b",
export=False
)
# Initialize tokenizer and generate text
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
generated_text = model.text_generation(
tokenizer=tokenizer,
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer
# Load and export model in one step
model = ExecuTorchModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B",
export=True,
recipe="xnnpack"
)
# Generate text right away
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
generated_text = model.text_generation(
tokenizer=tokenizer,
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
Check our ExecuTorch GitHub repo directly for:
- Custom model export configurations
- Performance optimization guides
- Deployment guides for Android, iOS, and embedded devices
- Additional examples
We love your input! We want to make contributing to Optimum ExecuTorch as easy and transparent as possible. Check out our:
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Report bugs through GitHub Issues