Supported Models

Chat/Instruct Models

Adept Persimmon (PersimmonForCausalLM)
- Chat-8B
Apriel (AprielForCausalLM)
- Instruct-5B
Aquila (AquilaForCausalLM)
- Chat2-7B, Chat2-34B, Chat2-7B-16K, Chat2-34B-16K
Baichuan (BaichuanForCausalLM, BaichuanM1ForCausalLM)
- Chat-7B, Chat-13B
- M1: Instruct-14B
- Fine-tunings: Med-R1 (Tip: --set chat_template im)
BlueLM (BlueLMForCausalLM)
- Chat-7B, Chat-7B 32K
ChatGLM (ChatGLMModel, Glm4ForCausalLM):
- ~~ChatGLM: 6B~~
- ChatGLM2 family: ChatGLM2 6B, CodeGeeX2 6B, ChatGLM3 6B
  
  Tip on CodeGeeX2: Code completion only, no context. Use system prompt to specify language, e.g. -s "# language: python".
- CharacterGLM: 6B (-a CharacterGLM)
  
  Note: Use additional key-value pair arguments to specify characters, --kv user_name "..." bot_name "..." user_info "..." bot_info "...".
- GLM-4: Chat-9B-128k, Chat-9B-1M
- CodeGeeX4: 9B (-a CodeGeeX4)
- GLM-4-0414: , GLM-Z1-9B-0414, GLM-4-32B-0414, GLM-Z1-32B-0414, GLM-Z1-Rumination-32B-0414
Cohere (CohereForCausalLM)
- C4AI Command-R
- Aya-23-8B, Aya-23-35B (-a Aya-23, fully compatible with Command-R)
- C4AI Command R7B
DeciLM (DeciLMForCausalLM)
- Nemotron: Llama-3.3-Nemotron-Super-49B-v1
DeepSeek (DeepseekForCausalLM, DeepseekV2ForCausalLM, DeepseekV3ForCausalLM)
- v1: Chat-16B
- v2: Chat (💣 not tested), Lite-Chat
- Coder v2: Instruct (💣 not tested), Lite-Instruct
- Moonlight: Instruct-16B (-a Moonlight)
- GigaChat: Instruct-20B (-a GigaChat)
Two optimization modes are defined: speed (default) and memory. See BaseMLAttention.
EXAONE (ExaoneForCausalLM)
- v3.5: Instruct-2.4B, Instruct-7.8B, Instruct-32B
- Deep: 2.4B, 7.8B, 32B
Gemma (GemmaForCausalLM)
- v1.0: Instruct-2B, Instruct-7B
- v1.1: Instruct-2B, Instruct-7B
- CodeGemma v1.1: Instruct-7B
- v2: Instruct-2B, Instruct-9B, Instruct-27B
- v3: Instruct-1B, Instruct-4B, Instruct-12B, Instruct-27B
Note: Only download tokenizer.model, but not tokenizer.json when converting.
Granite (GraniteForCausalLM, GraniteMoeForCausalLM)
- v3.0: Instruct-1B-A400M, Instruct-3B-A800M, Instruct-2B, Instruct-8B
- v3.1: Instruct-1B-A400M, Instruct-3B-A800M, Instruct-2B, Instruct-8B
- v3.2: Instruct-2B, Instruct-2B, Instruct-8B
HunYuan (HunYuanForCausalLM)
- Dense: Instruct-7B
Instella (InstellaForCausalLM)
- Instruct-3B
InternLM (InternLMForCausalLM, InternLM2ForCausalLM)
- v1: Chat-7B, Chat-7B v1.1, Chat-20B
- v2: Chat-1.8B, Chat-7B, Chat-20B, Math-Plus-1.8B, Math-Plus-7B, Math-Plus-20
- v2.5: Chat-1.8B, Chat-7B, Chat-7B-1M, Chat-20B
- v3: Instruct-8B
Ling (BailingMoeForCausalLM)
- Lite, Coder-Lite
LlaMA-like (LlamaForCausalLM, Llama4ForConditionalGeneration):
- All LlaMA-1 models
- LlaMA-2: Chat-7B, etc
- LlaMA-3: Instruct-8B, Instruct-70B, other derivations such as Llama3-8B-Chinese-Chat
- LlaMA-3.1: Instruct-8B, Instruct-70B
- LlaMA-3.2: Instruct-1B, Instruct-3B
- CodeLlaMA: Instruct-7B (-a CodeLlaMA)
- LLM-Compiler: 7B, 7B-FTD, 13B, 13B-FTD
- DeepSeek: Chat-7B (-a DeepSeek) , Coder-6.7B (-a DeepSeekCoder), Coder-Instruct-1.3B (-a DeepSeekCoder) 🔥
- Yi: (-a Yi)
  - v1: Chat-6B, Chat-34B
  - v1.5: Chat-6B, Chat-9B, Chat-34B, Chat-9B-16K, Chat-34B-16K
  - Coder: Chat-1.5B, Chat-9B
- WizardLM: LM 7B (-a WizardLM), LM 13B (-a WizardLM), Coder Python-7B (-a WizardCoder)
- TigerBot: Chat-7B, Chat-13B (-a TigerBot)
- CodeFuse-DeepSeek: 33B (-a CodeFuseDeepSeek)
- MAP-Neo: Instruct-7B (-a MAP-Neo)
- Index: Chat-1.9B, Character-1.9B, Chat-1.9B-32K
- NuminaMath: 7B-TIR
- SmolLM: (-a SmolLM)
  - v1: Instruct-1.7B
  - v2: Instruct-1.7B
- Groq: Llama-3-Groq-8B-Tool-Use (-a Llama-3-Groq-8B-Tool-Use)
- Megrez: Instruct-3B (-a Megrex)
- Falcon: (-a Falcon3)
  - v3: Instruct-1B, Instruct-3B, Instruct-7B, Instruct-10B
- DeepSeek-R1-Distill-LlaMA: 8B, 70B (-a DeepSeek-R1-Distill-LlaMA)
- DeepHermes-3: Llama-3-8B-Preview (Use -s ... to enable thinking)
- Watt-tool: 8B, 70B
- Reke-Flash: Flash-3 (-a Reka-Flash-3)
- Nemotron: Llama-3.1-Nemotron-Nano-8B
- LlaMA-4: Scout-Instruct, Maverick-Instruct
- Seed-Coder: Instruct-8B, Reasoning-8B (--name Seed-Coder)
For other models that using LlamaForCausalLM architecture, for example, aiXcoder-7B, try -a Yi.
MiniCPM (MiniCPMForCausalLM, MiniCPM3ForCausalLM)
- DPO-2B, SFT-2B, SFT-1B🔥
- 2B-128k (Note: --temp 0 is recommended.)
- MoE-8x2B
- 4B
Mistral (MistralForCausalLM, MixtralForCausalLM)
- Mistral: Instruct-7B-v0.2, Instruct-7B-v0.3
- Small: Instruct-24B
- OpenChat: 3.5 (-a OpenChat) 🔥
  
  Tip: Use system prompt to select modes: -s GPT4 (default mode), -s Math (mathematical reasoning mode).
- Starling: 7B-beta (-a Starling)
  
  Note: This is based on OpenChat, and is fully compatible with OpenChat GPT4 mode.
- WizardLM: Math 7B (-a WizardMath)
- Mixtral: Instruct-8x7B 🔥, Instruct-8x22B
  
  Three implementations of sliding-window attention (see SlidingWindowAttentionImpl):
  - Full cache: more RAM is needed.
  - Partial cache: less RAM is needed, and faster than ring cache (default).
  - Ring cache (i.e. rolling cache): least RAM, but current implementation is naive (slow). 💣
  Note: precision of these implementations differs, which causes different results.
- NeuralBeagle14: 7B (-a NeuralBeagle)
- WizardLM-2: WizardLM-2-8x22B (official link is gone) (-a WizardLM-2-MoE)
  
  Note: For MixtralForCausalLM models, --experts ... is supported to select a subset of experts when converting. For example, --experts 0,1,2,3 selects the first 4 experts.
- Codestral: 22B-v0.1
- Mistral-Nemo: Nemo-Instruct-2407
- Small: Instruct-24B
- DeepHermes-3-Mistral: 24B-Preview (-a DeepHermes-3-Mistral. Default: Thinking model.)
- Small-3.1: Instruct-24B
  
  Note: Please download config.json & tokenizer.json from here.
Olm (OlmoeForCausalLM, Olmo2ForCausalLM)
- OLMoE: Instruct-7B
- OLM-2: Instruct-7B, Instruct-13B, Instruct-32B
Orion (OrionForCausalLM)
- Chat-14B
Phi (PhiForCausalLM, Phi3ForCausalLM)
- Phi-2
  
  Tip: --temp 0 is recommended. Don't forget to try --format qa.
- Dolphin Phi-2 (-a DolphinPhi2) 🐬
- Phi-3: Mini-Instruct-4k, Mini-Instruct-128k, Medium-Instruct-4k, Medium-Instruct-128k
- Phi-3.5: Mini-Instruct, MoE-Instruct
- Phi-4: Instruct, Mini-Instruct
QWen (QWenLMHeadModel, Qwen2ForCausalLM, Qwen2MoeForCausalLM)
- v1: Chat-7B, Chat-14B, QAnything-7B
- v1.5: Chat-0.5B, Chat-1.8B, Chat-4B, Chat-7B, Chat-14B, CodeQwen-Chat-7B (-a CodeQwen)
- v1.5 MoE: Chat-A2.7B
- v2: Instruct-0.5B, Instruct-1.5B, Instruct-7B, Instruct-72B
- v2 MoE: Instruct-57B-A14B (💣 not tested)
- v2.5: Instruct-0.5B, Instruct-1.5B, Instruct-7B, Instruct-14B, Instruct-32B, Instruct-72B
- v2.5-Coder: Instruct-1.5B, Instruct-7B
- v2.5-Math: Instruct-1.5B, Instruct-7B, Instruct-72B
- Marco-o1 (-a Marco-o1)
- QwQ: 32B-Preview, 32B (-a QwQ)
- ReaderLM-v2 (-a ReaderLM-v2)
- DeepSeek-R1-Distill-QWen: 1.5B, 7B, 14B, 32B, DeepScaleR-1.5B-Preview (-a DeepSeek-R1-Distill-QWen)
- Skywork-OR1: Math-7B, 7B-Preview, 32B-Preview
- OlympicCoder: 7B, 32B
- v3: 235B-A22B (💣 not tested), 30B-A3B, 32B, 14B, 8B, 4B, 1.7B, 0.6B
- MiMo: 7B-RL
Solor (SolarForCausalLM)
- Pro
TeleChat (TeleChat2ForCausalLM)
- v2: 3B, 7B, 115B
XVERSE (XverseForCausalLM)
- Chat-7B, Chat-13B, Chat-65B
Note: Tokenizer's behavior is not 100% identical.
Zhinao (ZhinaoForCausalLM)
- Chat-7B-4K, Chat-7B-32K, Chat-7B-360K

Base Models

Please use --format completion for these models.

AlphaGeometry-LM (-a AlphaGeometry-LM)
- geometry.757
DeepSeek (DeepseekV2ForCausalLM)
- Coder-V2-Base (💣 not tested), Coder-V2-Lite-Base
Gemma (GemmaForCausalLM)
- CodeGemma v1.1: Base-2B, Base-7B
Grok-1
- Base
  
  About Grok-1.
LlaMA-like (LlamaForCausalLM):
- DeepSeek: Coder-Base-1.3B (-a DeepSeekCoder), Coder-Base-6.7B (-a DeepSeekCoder)
- Seed-Coder: Base-8B (--name Seed-Coder)
Mistral (MistralForCausalLM, MixtralForCausalLM)
- Mistral: Base-7B-v0.1, Base-7B-v0.3
Stable-LM (StableLMEpochModel)
- Code-3B
StarCoder (Starcoder2ForCausalLM)
- Base-3B, Base-7B, Base-15B

RAG Models

Text Embedding (XLMRobertaModel)
- BCE-Embedding
- BGE-M3 (-a BGE-M3)
- MiniCPM-Embedding-Light
  
  Note: Only dense embedding is implemented.
QA Ranking (XLMRobertaForSequenceClassification)
- BCE-ReRanker
- BGE-ReRanker-M3 (-a BGE-Reranker-M3)
- MiniCPM-Reranker-Light

LoRA Models

These LoRA models have been tested:

Llama-3-Chinese-8B-Instruct

Special Models

Meta-AI multi-token prediction models checkpoints

Download at least one multi-token prediction checkpoint (such as 7B_1T_4). Assume it is stored at /path/to/llama-multi-predict/7B_1T_4. Make sure tokenizer.model is downloaded to /path/to/llama-multi-predict.

To convert it with -a llama-multi-token-prediction-ckpt:
```
python convert.py -i /path/to/llama-multi-predict/7B_1T_4 -o llama-multi.bin -a llama-multi-token-prediction-ckpt
```
This is a base model, and remember to use --format completion.

Tip: Use --kv n_future_tokens N to change number of future tokens, N = [1, 4].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

models.md

models.md

Supported Models

Chat/Instruct Models

Base Models

RAG Models

LoRA Models

Special Models

Files

models.md

Latest commit

History

models.md

File metadata and controls

Supported Models

Chat/Instruct Models

Base Models

RAG Models

LoRA Models

Special Models