Update README.md

Tencent · Jan 3, 2025 · 231fb02 · 231fb02
1 parent cdc3735
commit 231fb02
Show file tree

Hide file tree

Showing 2 changed files with 70 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 </p><p></p>
 
 <p align="center">
-    🫣&nbsp<a href="https://huggingface.co/tencent/Tencent-Hunyuan-Large"><b>Hugging Face</b></a>&nbsp&nbsp |  &nbsp&nbsp🖥️&nbsp&nbsp<a href="https://llm.hunyuan.tencent.com/" style="color: red;"><b>official website</b></a>&nbsp&nbsp｜&nbsp&nbsp🕖&nbsp&nbsp <a href="https://cloud.tencent.com/product/hunyuan" ><b>HunyuanAPI</b></a>&nbsp&nbsp｜&nbsp&nbsp🐳&nbsp&nbsp <a href="https://gitee.com/Tencent/Tencent-Hunyuan-Large" ><b>Gitee</b></a>
+    🫣&nbsp<a href="https://huggingface.co/tencent/Tencent-Hunyuan-Large"><b>Hugging Face</b></a>&nbsp&nbsp |  &nbsp&nbsp🖥️&nbsp&nbsp<a href="https://llm.hunyuan.tencent.com/" style="color: red;"><b>official website</b></a>&nbsp&nbsp｜&nbsp&nbsp🕖&nbsp&nbsp <a href="https://cloud.tencent.com/product/hunyuan" ><b>HunyuanAPI</b></a>&nbsp&nbsp｜&nbsp&nbsp🐳&nbsp&nbsp <a href="https://cnb.cool/tencent/hunyuan"><b>cnb.cool</b></a>
 </p><p align="center">
     <a href="https://arxiv.org/abs/2411.02265" style="color: red;"><b>Technical Report</b></a>&nbsp&nbsp｜&nbsp&nbsp <a href="https://huggingface.co/spaces/tencent/Hunyuan-Large"><b>Demo</b></a>&nbsp&nbsp&nbsp｜&nbsp&nbsp <a href="https://cloud.tencent.com/document/product/851/112032" style="color: red;"><b>Tencent Cloud TI</b></a>&nbsp&nbsp&nbsp</p>
 <p><br></p>
@@ -21,22 +21,22 @@
             <tr>
                 <td align="center" style="width: 100px;" >Models</td>
                 <td align="center" style="width: 500px;">Huggingface Download URL</td>
-                <td align="center" style="width: 500px;">Tencent Cloud Download URL</td>
+                <td align="center" style="width: 500px;">cnb.cool Download URL</td>
             </tr>
             <tr>
                 <td style="width: 100px;">Hunyuan-A52B-Instruct-FP8</td>
                 <td style="width: 500px;"><a href="https://huggingface.co/tencent/Tencent-Hunyuan-Large/tree/main/Hunyuan-A52B-Instruct-FP8" style="color: red;">Hunyuan-A52B-Instruct-FP8</a></td>
-                <td style="width: 500px;"><a href="https://cdn-large-model.hunyuan.tencent.com/Hunyuan-A52B-Instruct-128k-fp8-20241116.zip" style="color: red;">Hunyuan-A52B-Instruct-FP8</a></td>
+                <td style="width: 500px;"><a href="https://cnb.cool/tencent/hunyuan/Hunyuan-A52B-Instruct-FP8.git" style="color: red;">Hunyuan-A52B-Instruct-FP8</a></td>
             </tr>
             <tr>
                 <td style="width: 100px;">Hunyuan-A52B-Instruct</td>
                 <td style="width: 500px;"><a href="https://huggingface.co/tencent/Tencent-Hunyuan-Large/tree/main/Hunyuan-A52B-Instruct" style="color: red;">Hunyuan-A52B-Instruct</a></td>
-                <td style="width: 500px;"><a href="https://cdn-large-model.hunyuan.tencent.com/Hunyuan-A52B-Instruct-128k-20241116.zip" style="color: red;">Hunyuan-A52B-Instruct</a></td>
+                <td style="width: 500px;"><a href="https://cnb.cool/tencent/hunyuan/Hunyuan-A52B-Instruct.git" style="color: red;">Hunyuan-A52B-Instruct</a></td>
             </tr>
             <tr>
                 <td style="width: 100px;">Hunyuan-A52B-Pretrain</td>
                 <td style="width: 500px;"><a href="https://huggingface.co/tencent/Tencent-Hunyuan-Large/tree/main/Hunyuan-A52B-Pretrain" style="color: red;">Hunyuan-A52B-Pretrain</a></td>
-                <td style="width: 500px;"><a href="https://cdn-large-model.hunyuan.tencent.com/Hunyuan-A52B-Pretrain-256k.zip" style="color: red;">Hunyuan-A52B-Pretrain</a></td>
+                <td style="width: 500px;"><a href="https://cnb.cool/tencent/hunyuan/Hunyuan-A52B-Pretrain.git" style="color: red;">Hunyuan-A52B-Pretrain</a></td>
             </tr>
         </tbody>
     </table>
@@ -47,10 +47,10 @@
 
 ## Model Introduction
 
-With the rapid development of artificial intelligence technology, large language models (LLMs) have made significant progress in fields such as natural language processing, computer vision, and scientific tasks. However, as the scale of these models increases, optimizing resource consumption while maintaining high performance has become a key challenge. To address this challenge, we have explored Mixture of Experts (MoE) models. The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. This is currently the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. 
+With the rapid development of artificial intelligence technology, large language models (LLMs) have made significant progress in fields such as natural language processing, computer vision, and scientific tasks. However, as the scale of these models increases, optimizing resource consumption while maintaining high performance has become a key challenge. To address this challenge, we have explored Mixture of Experts (MoE) models. The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. This is currently the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters.
 
 By open-sourcing the Hunyuan-Large model and revealing related technical details, we hope to inspire more researchers with innovative ideas and collectively advance the progress and application of AI technology. We welcome you to join our open-source community to explore and optimize future AI models together!
- 
+
 ### Introduction to Technical Advantages
 
 #### Model
@@ -75,21 +75,21 @@ By open-sourcing the Hunyuan-Large model and revealing related technical details
 
 ## Related News
 * 2024.11.25 Our self-developed long-context benchmark, i.e., PenguinScrolls, has been officially released! You can explore the project on [GitHub](https://github.com/Penguin-Scrolls/PenguinScrolls) and access the dataset on [Hugging Face](https://huggingface.co/datasets/Penguin-Scrolls/PenguinScrolls).
-* 2024.11.18 **Hunyuan-A52B-Instruct** and **Hunyuan-A52B-Instruct-FP8** model update. 
-* 2024.11.5 [TI Platform](https://cloud.tencent.com/product/ti) has integrated Hunyuan-Large model already, you can easily train and deploy it in just a few steps. Visit [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) to experience real-time conversations with the model, and explore [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) to create your own customized Hunyuan-Large model. 
+* 2024.11.18 **Hunyuan-A52B-Instruct** and **Hunyuan-A52B-Instruct-FP8** model update.
+* 2024.11.5 [TI Platform](https://cloud.tencent.com/product/ti) has integrated Hunyuan-Large model already, you can easily train and deploy it in just a few steps. Visit [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) to experience real-time conversations with the model, and explore [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) to create your own customized Hunyuan-Large model.
 * 2024.11.5 We have open-sourced **Hunyuan-A52B-Pretrain**, **Hunyuan-A52B-Instruct**, and **Hunyuan-A52B-Instruct-FP8** on Hugging Face. We also released a technical report and a training and inference operations manual, providing detailed information on the model's capabilities and the procedures for training and inference.
 
 
 
 
 ## Benchmark Evaluation
-**Hunyuan-Large pre-trained model** achieves the best overall performance compared to both Dense and MoE based 
-competitors having similar activated parameter sizes.  For aggregated benchmarks such as MMLU, MMLU-Pro, and CMMLU, 
+**Hunyuan-Large pre-trained model** achieves the best overall performance compared to both Dense and MoE based
+competitors having similar activated parameter sizes.  For aggregated benchmarks such as MMLU, MMLU-Pro, and CMMLU,
 Hunyuan-Large consistently achieves the best performance, confirming its comprehensive abilities on aggregated tasks.
-Hunyuan-Large also shows superior performance in commonsense understanding and reasoning, and classical NLP tasks 
-such as QA and reading comprehension tasks (e.g., CommonsenseQA, PIQA and TriviaQA).  
-For the mathematics capability, Hunyuan-Large outperforms all baselines in math datasets of GSM8K and MATH, 
-and also gains the best results on CMATH in Chinese.We also observe that Hunyuan-Large achieves the overall 
+Hunyuan-Large also shows superior performance in commonsense understanding and reasoning, and classical NLP tasks
+such as QA and reading comprehension tasks (e.g., CommonsenseQA, PIQA and TriviaQA).
+For the mathematics capability, Hunyuan-Large outperforms all baselines in math datasets of GSM8K and MATH,
+and also gains the best results on CMATH in Chinese.We also observe that Hunyuan-Large achieves the overall
 best performance in all Chinese tasks (e.g., CMMLU, C-Eval).
 
 | Model            | LLama3.1-405B | LLama3.1-70B | Mixtral-8x22B | DeepSeek-V2 | Hunyuan-Large |
@@ -114,13 +114,13 @@ best performance in all Chinese tasks (e.g., CMMLU, C-Eval).
 | HumanEval        | 61.0          | 58.5         | 53.1          | 48.8        | **71.4**          |
 | MBPP             | **73.4**          | 68.6         | 64.2          | 66.6        | 72.6          |
 
-**Hunyuan-Large-Instruct** achieves consistent improvements on most types of tasks compared to LLMs having similar 
-activated parameters, indicating the effectiveness of our post-training.    Delving into the model performance 
-in different categories of benchmarks, we find that our instruct model achieves the best performance on MMLU and MATH dataset.  
-Notably, on the MMLU dataset, our model demonstrates a significant improvement, outperforming the LLama3.1-405B model by 2.6%.   
-This enhancement is not just marginal but indicative of the Hunyuan-Large-Instruct’s superior understanding and reasoning 
-capabilities across a wide array of language understanding tasks. The model’s prowess is further underscored in its performance 
-on the MATH dataset, where it surpasses the LLama3.1-405B by a notable margin of 3.6%.  
+**Hunyuan-Large-Instruct** achieves consistent improvements on most types of tasks compared to LLMs having similar
+activated parameters, indicating the effectiveness of our post-training.    Delving into the model performance
+in different categories of benchmarks, we find that our instruct model achieves the best performance on MMLU and MATH dataset.
+Notably, on the MMLU dataset, our model demonstrates a significant improvement, outperforming the LLama3.1-405B model by 2.6%.
+This enhancement is not just marginal but indicative of the Hunyuan-Large-Instruct’s superior understanding and reasoning
+capabilities across a wide array of language understanding tasks. The model’s prowess is further underscored in its performance
+on the MATH dataset, where it surpasses the LLama3.1-405B by a notable margin of 3.6%.
 Remarkably, this leap in accuracy is achieved with only 52 billion activated parameters, underscoring the efficiency of our model.
 
 | Model                | LLama3.1 405B Inst. | LLama3.1 70B Inst. | Mixtral 8x22B Inst. | DeepSeekV2.5 Chat | Hunyuan-Large Inst. |
@@ -196,7 +196,7 @@ You can quickly get started by referring to the content in the <a href="examples
 
 To simplify the Training process, HunyuanLLM provides a pre-built Docker image:
 
- [hunyuaninfer/hunyuan-large](https://hub.docker.com/repository/docker/hunyuaninfer/hunyuan-large/general). 
+ [hunyuaninfer/hunyuan-large](https://hub.docker.com/repository/docker/hunyuaninfer/hunyuan-large/general).
 
 ### Hardware Requirements
 
@@ -568,27 +568,25 @@ You can experience our Hunyuan-Large model on Tencent Cloud. For details, please
 The Hunyuan-Large web demo is now open. Visit https://huggingface.co/spaces/tencent/Hunyuan-Large to easily experience our model.
 
 ## Training/Inference on TI
-Tencent Cloud's [TI Platform](https://cloud.tencent.com/product/ti) is a comprehensive machine learning platform tailored for AI engineers. With the Hunyuan-Large model already integrated, you can easily train and deploy it in just a few steps. Visit [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) to experience real-time conversations with the model, and explore [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) to create your own customized Hunyuan-Large model. 
+Tencent Cloud's [TI Platform](https://cloud.tencent.com/product/ti) is a comprehensive machine learning platform tailored for AI engineers. With the Hunyuan-Large model already integrated, you can easily train and deploy it in just a few steps. Visit [Chat with Hunyuan-Large](https://console.cloud.tencent.com/tione/v2/aimarket/detail/hunyuan_series?PublicAlgoGroupId=hunyuan-large-chat&detailTab=demo) to experience real-time conversations with the model, and explore [Hunyuan-Large Best Practice on TI](https://cloud.tencent.com/document/product/851/112032) to create your own customized Hunyuan-Large model.
 
 
 ## Citation
 If you find our work helpful, feel free to give us a cite.
 
 ```
 @misc{sun2024hunyuanlargeopensourcemoemodel,
-      title={Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent}, 
+      title={Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent},
       author={Xingwu Sun and Yanfeng Chen and Yiqing Huang and Ruobing Xie and Jiaqi Zhu and Kai Zhang and Shuaipeng Li and Zhen Yang and Jonny Han and Xiaobo Shu and Jiahao Bu and Zhongzhi Chen and Xuemeng Huang and Fengzong Lian and Saiyong Yang and Jianfeng Yan and Yuyuan Zeng and Xiaoqin Ren and Chao Yu and Lulu Wu and Yue Mao and Tao Yang and Suncong Zheng and Kan Wu and Dian Jiao and Jinbao Xue and Xipeng Zhang and Decheng Wu and Kai Liu and Dengpeng Wu and Guanghui Xu and Shaohua Chen and Shuang Chen and Xiao Feng and Yigeng Hong and Junqiang Zheng and Chengcheng Xu and Zongwei Li and Xiong Kuang and Jianglu Hu and Yiqi Chen and Yuchi Deng and Guiyang Li and Ao Liu and Chenchen Zhang and Shihui Hu and Zilong Zhao and Zifan Wu and Yao Ding and Weichao Wang and Han Liu and Roberts Wang and Hao Fei and Peijie She and Ze Zhao and Xun Cao and Hai Wang and Fusheng Xiang and Mengyuan Huang and Zhiyuan Xiong and Bin Hu and Xuebin Hou and Lei Jiang and Jiajia Wu and Yaping Deng and Yi Shen and Qian Wang and Weijie Liu and Jie Liu and Meng Chen and Liang Dong and Weiwen Jia and Hu Chen and Feifei Liu and Rui Yuan and Huilin Xu and Zhenxiang Yan and Tengfei Cao and Zhichao Hu and Xinhua Feng and Dong Du and Tinghao She and Yangyu Tao and Feng Zhang and Jianchen Zhu and Chengzhong Xu and Xirui Li and Chong Zha and Wen Ouyang and Yinben Xia and Xiang Li and Zekun He and Rongpeng Chen and Jiawei Song and Ruibin Chen and Fan Jiang and Chongqing Zhao and Bo Wang and Hao Gong and Rong Gan and Winston Hu and Zhanhui Kang and Yong Yang and Yuhong Liu and Di Wang and Jie Jiang},
       year={2024},
       eprint={2411.02265},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2411.02265}, 
+      url={https://arxiv.org/abs/2411.02265},
 }
 ```
 <br>
 
 ## Contact Us
 
 If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email (hunyuan_opensource@tencent.com).
-
-