🤗 Optimum ExecuTorch

Optimize and deploy Hugging Face models with ExecuTorch

Documentation | ExecuTorch | Hugging Face

🚀 Overview

Optimum ExecuTorch enables efficient deployment of transformer models using Meta's ExecuTorch framework. It provides:

🔄 Easy conversion of Hugging Face models to ExecuTorch format
⚡ Optimized inference with hardware-specific optimizations
🤝 Seamless integration with Hugging Face Transformers
📱 Efficient deployment on various devices

⚡ Quick Installation

Install from source:

git clone https://github.com/huggingface/optimum-executorch.git
cd optimum-executorch
pip install .

🔜 Install from pypi coming soon...

🎯 Quick Start

There are two ways to use Optimum ExecuTorch:

Option 1: Export and Load Separately

Step 1: Export your model

Use the CLI tool to convert your model to ExecuTorch format:

optimum-cli export executorch \
    --model "meta-llama/Llama-3.2-1B" \
    --task "text-generation" \
    --recipe "xnnpack" \
    --output_dir="meta_llama3_2_1b"

Step 2: Load and run inference

Use the exported model for text generation:

from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer

# Load the exported model
model = ExecuTorchModelForCausalLM.from_pretrained(
    "./meta_llama3_2_1b",
    export=False
)

# Initialize tokenizer and generate text
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
generated_text = model.text_generation(
    tokenizer=tokenizer,
    prompt="Simply put, the theory of relativity states that",
    max_seq_len=128
)

Option 2: Python API

from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer

# Load and export model in one step
model = ExecuTorchModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B",
    export=True,
    recipe="xnnpack"
)

# Generate text right away
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
generated_text = model.text_generation(
    tokenizer=tokenizer,
    prompt="Simply put, the theory of relativity states that",
    max_seq_len=128
)

🛠️ Advanced Usage

Check our ExecuTorch GitHub repo directly for:

Custom model export configurations
Performance optimization guides
Deployment guides for Android, iOS, and embedded devices
Additional examples

🤝 Contributing

We love your input! We want to make contributing to Optimum ExecuTorch as easy and transparent as possible. Check out our:

Contributing Guidelines
Code of Conduct

📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

📫 Get in Touch

Report bugs through GitHub Issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🤗 Optimum ExecuTorch

🚀 Overview

⚡ Quick Installation

🎯 Quick Start

Option 1: Export and Load Separately

Step 1: Export your model

Step 2: Load and run inference

Option 2: Python API

🛠️ Advanced Usage

🤝 Contributing

📝 License

📫 Get in Touch

Files

README.md

Latest commit

History

README.md

File metadata and controls

🤗 Optimum ExecuTorch

🚀 Overview

⚡ Quick Installation

🎯 Quick Start

Option 1: Export and Load Separately

Step 1: Export your model

Step 2: Load and run inference

Option 2: Python API

🛠️ Advanced Usage

🤝 Contributing

📝 License

📫 Get in Touch