Xmodel_LM2-1.2B

✨ Features

1B-scale large model, pretrained on ~1.5 trillion tokens.
Achieved a state-of-the-art performance on complex reasoning and hard agent tasks.
A data mixing law for warm-stable-decay(WSD) learning scheduler.
Fully open-source codebase and pretrained models.

🌟 Introduction

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities.

📊 Benchmark

Commonsense Reasoning

Model	ARC-c	ARC-e	Boolq	HellaSwag	OpenbookQA	PiQA	SciQ	Winogrande	Avg
MobiLlama-1B	28.24	61.53	60.92	46.74	21.80	75.14	88.20	59.27	55.23
TinyLLaMA1.1-1.1B	30.97	61.66	55.99	46.70	25.20	72.63	89.30	59.43	55.24
OLMo-1B	28.67	63.34	61.74	46.97	25.00	75.03	87.00	59.98	55.97
OpenELM-1.1B	28.84	62.37	63.58	48.36	25.40	74.76	90.60	61.72	56.95
Llama-3.2-1B	31.31	65.36	63.73	47.78	26.40	74.48	91.50	61.01	57.70
MiniCPM-1.2B	36.86	70.29	67.92	49.91	23.60	74.43	91.80	60.77	59.45
Fox-1-1.6B	34.73	69.91	71.77	46.33	24.60	75.24	93.20	60.77	59.57
InternLM2.5-1.8B	35.24	66.37	79.82	46.99	22.00	73.29	94.90	62.67	60.16
Qwen2-1.5B	33.11	66.41	72.60	48.57	27.00	75.57	94.60	65.75	60.45
StableLM-2-zephyr-1.6B	36.52	66.79	80.00	53.26	26.80	74.86	88.00	64.09	61.29
SmolLM-1.7B	43.43	76.47	65.93	49.58	30.00	75.79	93.20	60.93	61.92
Qwen2.5-1.5B	41.21	75.21	72.97	50.15	31.80	75.90	94.30	63.61	63.14
DCLM-1B	41.30	74.79	71.41	53.59	32.20	76.93	94.00	66.22	63.81
Phi-1.5-1.3B	44.80	76.22	74.95	47.96	38.60	76.66	93.30	72.93	65.68
Xmodel-2-1.2B	39.16	71.55	74.65	47.45	29.20	74.81	93.60	63.93	61.79

Complex Reasoning

Model	GSM8K (5-shot)	MATH (4-shot)	BBH (3-shot)	MMLU (0-shot)	HumanEval (pass@1)	MBPP (pass@1)	Avg
OpenELM-1.1B	0.45	1.06	6.62	25.52	8.54	6.80	8.16
OLMo-1B	2.35	1.46	25.60	24.46	5.49	0.20	9.93
TinyLLaMA1.1-1.1B	2.50	1.48	25.57	25.35	1.83	3.40	10.02
MobiLlama-1B	1.97	1.54	25.76	25.26	7.93	5.40	11.31
DCLM-1B	4.93	2.14	30.70	46.43	8.54	6.80	16.59
Llama-3.2-1B	6.60	1.78	31.44	36.63	14.63	22.20	18.88
SmolLM-1.7B	7.51	3.18	29.21	27.73	21.34	31.80	20.13
Fox-1-1.6B	34.34	7.94	28.75	39.55	14.02	9.00	22.27
StableLM-2-zephyr-1.6B	41.32	10.12	32.71	41.30	25.61	19.40	28.41
Phi-1.5-1.3B	32.15	3.18	28.81	41.75	36.59	35.40	29.65
InternLM2.5-1.8B	27.90	16.68	41.76	46.30	27.40	29.60	31.61
MiniCPM-1.2B	40.11	10.98	35.42	43.99	43.90	36.80	35.20
Qwen2-1.5B	57.62	22.90	33.05	55.11	20.73	30.40	36.64
Qwen2.5-1.5B	62.40	28.28	43.99	59.72	5.49	40.00	39.98
Xmodel-2-1.2B	55.88	25.50	48.40	48.87	29.88	29.20	39.62

Agent Capabilities

Model	HotpotQA (EM)	FEVER (EM)	AlfWorld (success rate)	WebShop (success rate)	Avg
Llama-3.2-1B	3.49	17.57	3.73	0.80	6.40
Qwen2.5-1.5B	12.60	11.83	3.73	0.60	7.19
Fox-1-1.6B	5.98	24.32	0.00	0.20	7.62
InternLM2.5-1.8B	3.67	25.66	2.99	1.20	8.38
MiniCPM-1.2B	13.88	26.32	2.99	0.00	10.80
StableLM-2-zephyr-1.6B	10.66	31.72	5.97	1.60	12.49
Xmodel-2-1.2B	14.23	34.84	0.00	4.80	13.47

🛠️ Install

Clone this repository and navigate to XmodelLM folder

git clone https://github.com/XiaoduoAILab/XmodelLM-2.git
cd XmodelLM-2

Install Package
```
pip install -r requirements.txt
```

🗝️ Quick Start

Download Xmodel-2 model

Our model files are fully open source on huggingface, you can download them at here. We offer both the pretrained model, Xmodel_LM1.5, and the instruction-tuned model, which has been trained exclusively on Chinese and English data.

Example for Xmodel-2 model inference

Download the model files first and save them in your folder. Then you can run the scripts below, we recommend entering an absolute path as the parameter.

from transformers.models.auto.modeling_auto import AutoModelForCausalLM
from transformers.models.auto.tokenization_auto import AutoTokenizer
model_path = os.path.expanduser("/path/to/Xmodel-2")
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)
prompt = "Give me a short introduction to large language model."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
stop_tokens = ["<|im_end|>", "<|im_start|>"]
stop_token_ids = []
for token in stop_tokens:
    encoded = tokenizer.encode(token, add_special_tokens=False)
    if encoded:
        stop_token_ids.extend(encoded)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=256,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id,
    stop_strings=stop_tokens,
    tokenizer=tokenizer
)
output = tokenizer.decode(
    generated_ids[0][len(model_inputs.input_ids[0]):], 
    skip_special_tokens=True
)
for stop_token in stop_tokens:
    output = output.replace(stop_token, "")
output = output.split("<|im_start|>")[0]
output = output.strip()
print("Generated Response:")
print(output)

The possible result generated by this code is:

Generated Response:
Large language models are advanced artificial intelligence systems that are trained on massive amounts of text data to generate human-like text. These models are typically trained on a large corpus of text data, such as books, articles, and websites, and are able to generate text that is coherent and contextually appropriate.
Large language models are often used in natural language processing (NLP) tasks, such as language translation, text summarization, and text generation. They are also used in a variety of other applications, such as chatbots, virtual assistants, and language learning tools.
Large language models are a key component of the field of artificial intelligence and are being used in a variety of industries and applications. They are a powerful tool for generating human-like text and are helping to transform the way that we interact with technology.

✏️ Reference

If you find Xmodel-2 useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@misc{qun2024xmodel2technicalreport,
      title={Xmodel-2 Technical Report}, 
      author={Wang Qun and Liu Yang and Lin Qingquan and Qu Zhijiu and Jiang Ling},
      year={2024},
      eprint={2412.19638},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.19638}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
data_preprocess		data_preprocess
eval_proc		eval_proc
eval_result		eval_result
experiments		experiments
minicpm		minicpm
models		models
mup_search		mup_search
scaling_law		scaling_law
scripts		scripts
tensor_programs/monte_carlo		tensor_programs/monte_carlo
tokenizers/v11/xmodel_65280		tokenizers/v11/xmodel_65280
tools		tools
LICENSE		LICENSE
README.md		README.md
data_utils.py		data_utils.py
finetune_long_context.py		finetune_long_context.py
finetune_long_context.sh		finetune_long_context.sh
lr_scheduler.py		lr_scheduler.py
packed_dataset.py		packed_dataset.py
show_models.py		show_models.py
train.py		train.py
train_data_mix.py		train_data_mix.py
train_decay.py		train_decay.py
train_fsdp.py		train_fsdp.py
train_gas.py		train_gas.py
train_mup.py		train_mup.py
train_old.py		train_old.py
train_s1_no_mup.py		train_s1_no_mup.py
train_s2.py		train_s2.py
train_s2_decay.py		train_s2_decay.py
train_s2_decay_copy.py		train_s2_decay_copy.py
train_s2_exp3.py		train_s2_exp3.py
train_s2_no_mup.py		train_s2_no_mup.py
train_sft.py		train_sft.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Xmodel_LM2-1.2B

✨ Features

🌟 Introduction

📊 Benchmark

Commonsense Reasoning

Complex Reasoning

Agent Capabilities

🛠️ Install

🗝️ Quick Start

Download Xmodel-2 model

Example for Xmodel-2 model inference

✏️ Reference

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

XiaoduoAILab/Xmodel-2

Folders and files

Latest commit

History

Repository files navigation

Xmodel_LM2-1.2B

✨ Features

🌟 Introduction

📊 Benchmark

Commonsense Reasoning

Complex Reasoning

Agent Capabilities

🛠️ Install

🗝️ Quick Start

Download Xmodel-2 model

Example for Xmodel-2 model inference

✏️ Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages