Apriel-H1

/ˈɑː.pri.əl/

Apriel-H1 inference - vLLM plugin for the Apriel-H1 family of hybrid reasoning models.

📊 Model Overview

Apriel-H1-15b-Thinker-SFT is a 15B-parameter hybrid reasoning model combining Transformer attention and Mamba State Space layers for high efficiency and scalability. Derived from Apriel-Nemotron-15B-Thinker through progressive distillation, Apriel-H1 replaces less critical attention layers with linear Mamba blocks—achieving over 2× higher inference throughput in vLLM with minimal loss in reasoning, math, and coding performance.

Key Features

Model Size: 15B parameters
Context Length: 65K (target; runtime dependent)
Languages: English (best)
Hybrid Transformer–SSM architecture
~2× throughput improvement over the base Thinker model
Retains strong reasoning, math, and coding capabilities
Built via efficient distillation—no training from scratch required

Technical report: Apriel-H1 Report

Training stack: Fast-LLM

Efficient and strong among hybrids

All models were evaluated with vllm server endpoints using FlashInfer (except for AI21-Jamba-Reasoning-3B which used FlashAttention2), mamba_cache was set to fp32 for models: NVIDIA-Nemotron-Nano-9B-v2 and AI21-Jamba-Reasoning-3B.

Comparing with Thinker ~2x speedup!

🚧 Stay tuned for the vLLM plugin!

📖 Citation

@misc{apriel_h1_2025,
  title        = {Apriel-H1: Towards Efficient Enterprise Reasoning Models},
  author       = {ServiceNow Language Models Lab},
  archivePrefix = {arXiv},
  eprint        = {2511.02651},
  primaryClass  = {cs.LG},
  url           = {https://arxiv.org/abs/2511.02651},
  note          = {Model available at \url{https://huggingface.co/ServiceNow-AI/Apriel-H1-15b-Thinker-SFT}},
  year          = {2025}
}

🔗 Links

Model Card: ServiceNow-AI/Apriel-H1-15b-Thinker-SFT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apriel-H1

📊 Model Overview

Key Features

Technical report: Apriel-H1 Report

Training stack: Fast-LLM

Efficient and strong among hybrids

Comparing with Thinker ~2x speedup!

🚧 Stay tuned for the vLLM plugin!

📖 Citation

🔗 Links

About

Uh oh!

Releases

Packages

Contributors 2

ServiceNow/apriel

Folders and files

Latest commit

History

Repository files navigation

Apriel-H1

📊 Model Overview

Key Features

Technical report: Apriel-H1 Report

Training stack: Fast-LLM

Efficient and strong among hybrids

Comparing with Thinker ~2x speedup!

🚧 Stay tuned for the vLLM plugin!

📖 Citation

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages