diff --git a/README.md b/README.md index f1b5809..0abc55d 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@

- 🫣 Hugging Face   |   🖥️  official website  |  🕖   HunyuanAPI  |  🐳   Gitee + 🫣 Hugging Face   |   🖥️  official website  |  🕖   HunyuanAPI  |  🐳   cnb.cool

Technical Report  |   Demo   |   Tencent Cloud TI   


@@ -21,22 +21,22 @@ Models Huggingface Download URL - Tencent Cloud Download URL + cnb.cool Download URL Hunyuan-A52B-Instruct-FP8 Hunyuan-A52B-Instruct-FP8 - Hunyuan-A52B-Instruct-FP8 + Hunyuan-A52B-Instruct-FP8 Hunyuan-A52B-Instruct Hunyuan-A52B-Instruct - Hunyuan-A52B-Instruct + Hunyuan-A52B-Instruct Hunyuan-A52B-Pretrain Hunyuan-A52B-Pretrain - Hunyuan-A52B-Pretrain + Hunyuan-A52B-Pretrain @@ -47,10 +47,10 @@ ## Model Introduction -With the rapid development of artificial intelligence technology, large language models (LLMs) have made significant progress in fields such as natural language processing, computer vision, and scientific tasks. However, as the scale of these models increases, optimizing resource consumption while maintaining high performance has become a key challenge. To address this challenge, we have explored Mixture of Experts (MoE) models. The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. This is currently the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. +With the rapid development of artificial intelligence technology, large language models (LLMs) have made significant progress in fields such as natural language processing, computer vision, and scientific tasks. However, as the scale of these models increases, optimizing resource consumption while maintaining high performance has become a key challenge. To address this challenge, we have explored Mixture of Experts (MoE) models. The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. This is currently the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. By open-sourcing the Hunyuan-Large model and revealing related technical details, we hope to inspire more researchers with innovative ideas and collectively advance the progress and application of AI technology. We welcome you to join our open-source community to explore and optimize future AI models together! - + ### Introduction to Technical Advantages #### Model @@ -75,21 +75,21 @@ By open-sourcing the Hunyuan-Large model and revealing related technical details ## Related News * 2024.11.25 Our self-developed long-context benchmark, i.e., PenguinScrolls, has been officially released! You can explore the project on [GitHub](https://github.com/Penguin-Scrolls/PenguinScrolls) and access the dataset on [Hugging Face](https://huggingface.co/datasets/Penguin-Scrolls/PenguinScrolls). -* 2024.11.18 **Hunyuan-A52B-Instruct** and **Hunyuan-A52B-Instruct-FP8** model update. -* 2024.11.5 [TI Platform](https://cloud.tencent.com/product/ti) has integrated Hunyuan-Large model already, you can easily train and deploy it in just a few steps. Visit [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) to experience real-time conversations with the model, and explore [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) to create your own customized Hunyuan-Large model. +* 2024.11.18 **Hunyuan-A52B-Instruct** and **Hunyuan-A52B-Instruct-FP8** model update. +* 2024.11.5 [TI Platform](https://cloud.tencent.com/product/ti) has integrated Hunyuan-Large model already, you can easily train and deploy it in just a few steps. Visit [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) to experience real-time conversations with the model, and explore [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) to create your own customized Hunyuan-Large model. * 2024.11.5 We have open-sourced **Hunyuan-A52B-Pretrain**, **Hunyuan-A52B-Instruct**, and **Hunyuan-A52B-Instruct-FP8** on Hugging Face. We also released a technical report and a training and inference operations manual, providing detailed information on the model's capabilities and the procedures for training and inference. ## Benchmark Evaluation -**Hunyuan-Large pre-trained model** achieves the best overall performance compared to both Dense and MoE based -competitors having similar activated parameter sizes. For aggregated benchmarks such as MMLU, MMLU-Pro, and CMMLU, +**Hunyuan-Large pre-trained model** achieves the best overall performance compared to both Dense and MoE based +competitors having similar activated parameter sizes. For aggregated benchmarks such as MMLU, MMLU-Pro, and CMMLU, Hunyuan-Large consistently achieves the best performance, confirming its comprehensive abilities on aggregated tasks. -Hunyuan-Large also shows superior performance in commonsense understanding and reasoning, and classical NLP tasks -such as QA and reading comprehension tasks (e.g., CommonsenseQA, PIQA and TriviaQA). -For the mathematics capability, Hunyuan-Large outperforms all baselines in math datasets of GSM8K and MATH, -and also gains the best results on CMATH in Chinese.We also observe that Hunyuan-Large achieves the overall +Hunyuan-Large also shows superior performance in commonsense understanding and reasoning, and classical NLP tasks +such as QA and reading comprehension tasks (e.g., CommonsenseQA, PIQA and TriviaQA). +For the mathematics capability, Hunyuan-Large outperforms all baselines in math datasets of GSM8K and MATH, +and also gains the best results on CMATH in Chinese.We also observe that Hunyuan-Large achieves the overall best performance in all Chinese tasks (e.g., CMMLU, C-Eval). | Model | LLama3.1-405B | LLama3.1-70B | Mixtral-8x22B | DeepSeek-V2 | Hunyuan-Large | @@ -114,13 +114,13 @@ best performance in all Chinese tasks (e.g., CMMLU, C-Eval). | HumanEval | 61.0 | 58.5 | 53.1 | 48.8 | **71.4** | | MBPP | **73.4** | 68.6 | 64.2 | 66.6 | 72.6 | -**Hunyuan-Large-Instruct** achieves consistent improvements on most types of tasks compared to LLMs having similar -activated parameters, indicating the effectiveness of our post-training. Delving into the model performance -in different categories of benchmarks, we find that our instruct model achieves the best performance on MMLU and MATH dataset. -Notably, on the MMLU dataset, our model demonstrates a significant improvement, outperforming the LLama3.1-405B model by 2.6%. -This enhancement is not just marginal but indicative of the Hunyuan-Large-Instruct’s superior understanding and reasoning -capabilities across a wide array of language understanding tasks. The model’s prowess is further underscored in its performance -on the MATH dataset, where it surpasses the LLama3.1-405B by a notable margin of 3.6%. +**Hunyuan-Large-Instruct** achieves consistent improvements on most types of tasks compared to LLMs having similar +activated parameters, indicating the effectiveness of our post-training. Delving into the model performance +in different categories of benchmarks, we find that our instruct model achieves the best performance on MMLU and MATH dataset. +Notably, on the MMLU dataset, our model demonstrates a significant improvement, outperforming the LLama3.1-405B model by 2.6%. +This enhancement is not just marginal but indicative of the Hunyuan-Large-Instruct’s superior understanding and reasoning +capabilities across a wide array of language understanding tasks. The model’s prowess is further underscored in its performance +on the MATH dataset, where it surpasses the LLama3.1-405B by a notable margin of 3.6%. Remarkably, this leap in accuracy is achieved with only 52 billion activated parameters, underscoring the efficiency of our model. | Model | LLama3.1 405B Inst. | LLama3.1 70B Inst. | Mixtral 8x22B Inst. | DeepSeekV2.5 Chat | Hunyuan-Large Inst. | @@ -196,7 +196,7 @@ You can quickly get started by referring to the content in the - 🫣 Hugging Face   |   🖥️  官网  |  🕖   混元API|  🐳   Gitee + 🫣 Hugging Face   |   🖥️  官网  |  🕖   混元API|  🐳   cnb.cool

技术报告  |   Demo   |   Tencent Cloud TI   

+


+

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Download Models
ModelsHuggingface Download URLcnb.cool Download URL
Hunyuan-A52B-Instruct-FP8Hunyuan-A52B-Instruct-FP8Hunyuan-A52B-Instruct-FP8
Hunyuan-A52B-InstructHunyuan-A52B-InstructHunyuan-A52B-Instruct
Hunyuan-A52B-PretrainHunyuan-A52B-PretrainHunyuan-A52B-Pretrain
+

+

## 模型介绍 @@ -22,7 +53,7 @@ ### 技术优势介绍 -#### 模型 +#### 模型 - **高质量合成数据**:通过合成数据增强训练,Hunyuan-Large能够学习到更丰富的表示,处理长上下文输入,并更好地泛化到未见数据 - **KV缓存压缩**:采用分组查询注意力(GQA)和跨层注意力(CLA)策略,显著减少了KV缓存的内存占用和计算开销,提高了推理吞吐 @@ -42,14 +73,14 @@   ## 新闻 -* 2024.11.25 我们自主开发的长上下文评估集——PenguinScrolls,已经正式发布!详见[GitHub](https://github.com/Penguin-Scrolls/PenguinScrolls)和 [Hugging Face](https://huggingface.co/datasets/Penguin-Scrolls/PenguinScrolls)。 +* 2024.11.25 我们自主开发的长上下文评估集——PenguinScrolls,已经正式发布!详见[GitHub](https://github.com/Penguin-Scrolls/PenguinScrolls)和 [Hugging Face](https://huggingface.co/datasets/Penguin-Scrolls/PenguinScrolls)。 * 2024.11.20 **Hunyuan-A52B-Instruct** 和**Hunyuan-A52B-Instruct-FP8**模型权重更新。 * 2024.11.5 [TI平台](https://cloud.tencent.com/product/ti) 已经集成了Hunyuan-Large模型,您只需几步即可轻松进行训练和部署。访问 [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) 与模型的实时对话,并在TI上探索 [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) 并创建自己的定制化Hunyuan-Large。 * 2024.11.5 我们在Hugging Face开源了**Hunyuan-A52B-Pretrain** 、 **Hunyuan-A52B-Instruct** 和**Hunyuan-A52B-Instruct-FP8**。并发布了技术报告和训练推理操作手册,详细介绍了模型能力和训练与推理的操作。
-## Benchmark评估榜单 +## Benchmark评估榜单 **Hunyuan-Large 预训练模型**与具有相似激活参数大小的Dense和MoE竞争对手相比,实现了最佳的整体性能。 对于MMLU、MMLU-pro、CMMLU等基准评测,Hunyuan-Large的性能始终保持在最佳水准,证实了它在聚合任务上的综合能力。 @@ -109,7 +140,7 @@ Hunyuan-Large在常识理解和推理以及经典的NLP任务,如QA和阅读 Hunyuan-Large提供了模型训练相关流程,您可以在此章节对训练数据格式进行处理以供模型训练使用。 -### 训练数据格式及处理 +### 训练数据格式及处理 训练数据按照以下形式处理为messages格式,训练和推理的默认system prompt为"You are a helpful assistant.",以下分别为单轮数据和多轮数据样例: @@ -148,7 +179,7 @@ ids = tokenizer.apply_chat_template(messages) 您可以参照快速开始文档中的内容进行快速上手。 -## 模型训练 +## 模型训练 为了简化部署过程,HunyuanLLM提供了预构建docker镜像: [hunyuaninfer/hunyuan-large](https://hub.docker.com/repository/docker/hunyuaninfer/hunyuan-large/general) 。 @@ -256,7 +287,7 @@ Are you sure you want to continue connecting (yes/no)? **注意:** - 如果想从一个中途保存的 ckpt 继续训练,而不是加载一个预训练的权重,直接指定`--resume_from_checkpoint`为之前训练保存的 ckpt 路径,不要指定`--model_name_or_path`,这样只会加载权重,而不会加载训练状态 -- 从 ckpt 继续训练时,loss 可能会有微小的偏差,这是由一些非确定性算法带来的随机性,是正常现象。参考:[HuggingFace Transformers Trainer Randomness +- 从 ckpt 继续训练时,loss 可能会有微小的偏差,这是由一些非确定性算法带来的随机性,是正常现象。参考:[HuggingFace Transformers Trainer Randomness - 当 `--model_name_or_path` 有效时,所有模型相关的参数都会被忽略 - 一个 batch 内的样本会通过 padding 对齐 batch 内最长的样本,而每条样本的长度最长为 max_seq_length,超出的部分会被裁剪 - 如果报出 bias 权重没有 load 的 warning,忽略即可,Hunyuan-Large 中不会用到 bias @@ -284,7 +315,7 @@ Are you sure you want to continue connecting (yes/no)?   -## 推理和部署 +## 推理和部署 HunyuanLLM支持TRT-LLM和vLLM两种部署方式。本次我们开源vLLM部署方式(详见'使用vLLM推理'章节),TRT-LLM部署方式(详见'使用TRT-LLM推理'章节)将在近期开放。 @@ -352,7 +383,7 @@ ray start --block --head --node-ip-address=${LOCAL_IP} --port=6379 export VLLM_HOST_IP=${LOCAL_IP} export NCCL_SOCKET_IFNAME=bond1 export GLOO_SOCKET_IFNAME=bond1 -ray start --block --address={主节点$LOCAL_IP}:6379 --node-ip-address=${LOCAL_IP} +ray start --block --address={主节点$LOCAL_IP}:6379 --node-ip-address=${LOCAL_IP} ``` 如果启动ray失败,执行`ray stop`后再次执行上述命令。 @@ -503,12 +534,12 @@ HunYuan-Large模型中采用的tokenizer平衡了压缩率和效果两个因素 ## 混元API 您可以在腾讯云体验我们的hunyuan-large模型,具体请见:https://cloud.tencent.com/document/product/1729/97730。 -## 交互式Demo Web +## 交互式Demo Web Hunyuan-Large现已开放网页demo。访问 https://huggingface.co/spaces/tencent/Hunyuan-Large 即可简单体验我们的模型。
-## 使用TI训练/推理 +## 使用TI训练/推理 腾讯云的 [TI平台](https://cloud.tencent.com/product/ti) 是专门为AI工程师设计的全面的机器学习平台。通过集成Hunyuan-Large模型,您只需几步即可轻松进行训练和部署。访问 [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) 模块,体验与模型的实时对话,并在TI上探索 [Hunyuan-Large Best Practice](https://cloud.tencent.com/document/product/851/112032) ,创建自己的定制Hunyuan-Large模型。 ## 引用 @@ -516,13 +547,13 @@ Hunyuan-Large现已开放网页demo。访问 https://huggingface.co/spaces/tence ``` @misc{sun2024hunyuanlargeopensourcemoemodel, - title={Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent}, + title={Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent}, author={Xingwu Sun and Yanfeng Chen and Yiqing Huang and Ruobing Xie and Jiaqi Zhu and Kai Zhang and Shuaipeng Li and Zhen Yang and Jonny Han and Xiaobo Shu and Jiahao Bu and Zhongzhi Chen and Xuemeng Huang and Fengzong Lian and Saiyong Yang and Jianfeng Yan and Yuyuan Zeng and Xiaoqin Ren and Chao Yu and Lulu Wu and Yue Mao and Tao Yang and Suncong Zheng and Kan Wu and Dian Jiao and Jinbao Xue and Xipeng Zhang and Decheng Wu and Kai Liu and Dengpeng Wu and Guanghui Xu and Shaohua Chen and Shuang Chen and Xiao Feng and Yigeng Hong and Junqiang Zheng and Chengcheng Xu and Zongwei Li and Xiong Kuang and Jianglu Hu and Yiqi Chen and Yuchi Deng and Guiyang Li and Ao Liu and Chenchen Zhang and Shihui Hu and Zilong Zhao and Zifan Wu and Yao Ding and Weichao Wang and Han Liu and Roberts Wang and Hao Fei and Peijie She and Ze Zhao and Xun Cao and Hai Wang and Fusheng Xiang and Mengyuan Huang and Zhiyuan Xiong and Bin Hu and Xuebin Hou and Lei Jiang and Jiajia Wu and Yaping Deng and Yi Shen and Qian Wang and Weijie Liu and Jie Liu and Meng Chen and Liang Dong and Weiwen Jia and Hu Chen and Feifei Liu and Rui Yuan and Huilin Xu and Zhenxiang Yan and Tengfei Cao and Zhichao Hu and Xinhua Feng and Dong Du and Tinghao She and Yangyu Tao and Feng Zhang and Jianchen Zhu and Chengzhong Xu and Xirui Li and Chong Zha and Wen Ouyang and Yinben Xia and Xiang Li and Zekun He and Rongpeng Chen and Jiawei Song and Ruibin Chen and Fan Jiang and Chongqing Zhao and Bo Wang and Hao Gong and Rong Gan and Winston Hu and Zhanhui Kang and Yong Yang and Yuhong Liu and Di Wang and Jie Jiang}, year={2024}, eprint={2411.02265}, archivePrefix={arXiv}, primaryClass={cs.CL}, - url={https://arxiv.org/abs/2411.02265}, + url={https://arxiv.org/abs/2411.02265}, } ```