Skip to content

XiaoduoAILab/Xmodel-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Xmodel_LM2-1.2B

hf_space arXiv Code License githubgithub

✨ Features

  • 1B-scale large model, pretrained on ~1.5 trillion tokens.
  • Achieved a state-of-the-art performance on complex reasoning and hard agent tasks.
  • A data mixing law for warm-stable-decay(WSD) learning scheduler.
  • Fully open-source codebase and pretrained models.

🌟 Introduction

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities.

📊 Benchmark

Commonsense Reasoning

Model ARC-c ARC-e Boolq HellaSwag OpenbookQA PiQA SciQ Winogrande Avg
MobiLlama-1B 28.24 61.53 60.92 46.74 21.80 75.14 88.20 59.27 55.23
TinyLLaMA1.1-1.1B 30.97 61.66 55.99 46.70 25.20 72.63 89.30 59.43 55.24
OLMo-1B 28.67 63.34 61.74 46.97 25.00 75.03 87.00 59.98 55.97
OpenELM-1.1B 28.84 62.37 63.58 48.36 25.40 74.76 90.60 61.72 56.95
Llama-3.2-1B 31.31 65.36 63.73 47.78 26.40 74.48 91.50 61.01 57.70
MiniCPM-1.2B 36.86 70.29 67.92 49.91 23.60 74.43 91.80 60.77 59.45
Fox-1-1.6B 34.73 69.91 71.77 46.33 24.60 75.24 93.20 60.77 59.57
InternLM2.5-1.8B 35.24 66.37 79.82 46.99 22.00 73.29 94.90 62.67 60.16
Qwen2-1.5B 33.11 66.41 72.60 48.57 27.00 75.57 94.60 65.75 60.45
StableLM-2-zephyr-1.6B 36.52 66.79 80.00 53.26 26.80 74.86 88.00 64.09 61.29
SmolLM-1.7B 43.43 76.47 65.93 49.58 30.00 75.79 93.20 60.93 61.92
Qwen2.5-1.5B 41.21 75.21 72.97 50.15 31.80 75.90 94.30 63.61 63.14
DCLM-1B 41.30 74.79 71.41 53.59 32.20 76.93 94.00 66.22 63.81
Phi-1.5-1.3B 44.80 76.22 74.95 47.96 38.60 76.66 93.30 72.93 65.68
Xmodel-2-1.2B 39.16 71.55 74.65 47.45 29.20 74.81 93.60 63.93 61.79

Complex Reasoning

Model GSM8K (5-shot) MATH (4-shot) BBH (3-shot) MMLU (0-shot) HumanEval (pass@1) MBPP (pass@1) Avg
OpenELM-1.1B 0.45 1.06 6.62 25.52 8.54 6.80 8.16
OLMo-1B 2.35 1.46 25.60 24.46 5.49 0.20 9.93
TinyLLaMA1.1-1.1B 2.50 1.48 25.57 25.35 1.83 3.40 10.02
MobiLlama-1B 1.97 1.54 25.76 25.26 7.93 5.40 11.31
DCLM-1B 4.93 2.14 30.70 46.43 8.54 6.80 16.59
Llama-3.2-1B 6.60 1.78 31.44 36.63 14.63 22.20 18.88
SmolLM-1.7B 7.51 3.18 29.21 27.73 21.34 31.80 20.13
Fox-1-1.6B 34.34 7.94 28.75 39.55 14.02 9.00 22.27
StableLM-2-zephyr-1.6B 41.32 10.12 32.71 41.30 25.61 19.40 28.41
Phi-1.5-1.3B 32.15 3.18 28.81 41.75 36.59 35.40 29.65
InternLM2.5-1.8B 27.90 16.68 41.76 46.30 27.40 29.60 31.61
MiniCPM-1.2B 40.11 10.98 35.42 43.99 43.90 36.80 35.20
Qwen2-1.5B 57.62 22.90 33.05 55.11 20.73 30.40 36.64
Qwen2.5-1.5B 62.40 28.28 43.99 59.72 5.49 40.00 39.98
Xmodel-2-1.2B 55.88 25.50 48.40 48.87 29.88 29.20 39.62

Agent Capabilities

Model HotpotQA (EM) FEVER (EM) AlfWorld (success rate) WebShop (success rate) Avg
Llama-3.2-1B 3.49 17.57 3.73 0.80 6.40
Qwen2.5-1.5B 12.60 11.83 3.73 0.60 7.19
Fox-1-1.6B 5.98 24.32 0.00 0.20 7.62
InternLM2.5-1.8B 3.67 25.66 2.99 1.20 8.38
MiniCPM-1.2B 13.88 26.32 2.99 0.00 10.80
StableLM-2-zephyr-1.6B 10.66 31.72 5.97 1.60 12.49
Xmodel-2-1.2B 14.23 34.84 0.00 4.80 13.47

🛠️ Install

  1. Clone this repository and navigate to XmodelLM folder

    git clone https://github.com/XiaoduoAILab/XmodelLM-2.git
    cd XmodelLM-2
  2. Install Package

    pip install -r requirements.txt

🗝️ Quick Start

Download Xmodel-2 model

Our model files are fully open source on huggingface, you can download them at here. We offer both the pretrained model, Xmodel_LM1.5, and the instruction-tuned model, which has been trained exclusively on Chinese and English data.

Example for Xmodel-2 model inference

Download the model files first and save them in your folder. Then you can run the scripts below, we recommend entering an absolute path as the parameter.

from transformers.models.auto.modeling_auto import AutoModelForCausalLM
from transformers.models.auto.tokenization_auto import AutoTokenizer
model_path = os.path.expanduser("/path/to/Xmodel-2")
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)
prompt = "Give me a short introduction to large language model."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
stop_tokens = ["<|im_end|>", "<|im_start|>"]
stop_token_ids = []
for token in stop_tokens:
    encoded = tokenizer.encode(token, add_special_tokens=False)
    if encoded:
        stop_token_ids.extend(encoded)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=256,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id,
    stop_strings=stop_tokens,
    tokenizer=tokenizer
)
output = tokenizer.decode(
    generated_ids[0][len(model_inputs.input_ids[0]):], 
    skip_special_tokens=True
)
for stop_token in stop_tokens:
    output = output.replace(stop_token, "")
output = output.split("<|im_start|>")[0]
output = output.strip()
print("Generated Response:")
print(output) 

The possible result generated by this code is:

Generated Response:
Large language models are advanced artificial intelligence systems that are trained on massive amounts of text data to generate human-like text. These models are typically trained on a large corpus of text data, such as books, articles, and websites, and are able to generate text that is coherent and contextually appropriate.
Large language models are often used in natural language processing (NLP) tasks, such as language translation, text summarization, and text generation. They are also used in a variety of other applications, such as chatbots, virtual assistants, and language learning tools.
Large language models are a key component of the field of artificial intelligence and are being used in a variety of industries and applications. They are a powerful tool for generating human-like text and are helping to transform the way that we interact with technology.

✏️ Reference

If you find Xmodel-2 useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@misc{qun2024xmodel2technicalreport,
      title={Xmodel-2 Technical Report}, 
      author={Wang Qun and Liu Yang and Lin Qingquan and Qu Zhijiu and Jiang Ling},
      year={2024},
      eprint={2412.19638},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.19638}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published