GitHub - alibaba/Pai-Megatron-Patch: The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Quick Start

	Megatron-LM-Dense	Megatron-Core-Dense	Megatron-Core-MoE	MegaBlocks-MoE
Qwen2-VL	N/A	ReadMe	N/A	N/A
LLaVA	N/A	ReadMe	N/A	N/A
Qwen2.5	N/A	ReadMe	N/A	N/A
LLama3.1	N/A	ReadMe	N/A	N/A
LLama3	ReadMe	ReadMe	N/A	N/A
LLama2	ReadMe	ReadMe	N/A	N/A
Mistral	ReadMe	ReadMe	ReadMe	N/A
Qwen2	N/A	ReadMe	ReadMe	N/A
Qwen1.5	ReadMe	ReadMe	ReadMe	ReadMe
DeepSeek-V2	N/A	N/A	ReadMe	N/A

Introduction

English | 简体中文

Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.

What's New:

Upgrade DeepSeek-V2-MoE for facilitating a smooth transition to integrating the DeepSeek-V3-MoE. [🔥🔥 2025.01.16]
Upgrade Qwen2-VL models to support Sequence Parallel, VPP and TP-Comm-Overlap. [🔥🔥 2025.01.15]
Upgrade Qwen2-VL models to support MG2HF ckpts conversion and training with multi-turn complex multimodal samples. [🔥🔥 2024.12.27]
Support training Qwen2-VL models by using Megatron-Core. [🔥🔥 2024.11.27]
Support training LLaVA models by using Megatron-Core. [🔥🔥 2024.11.20]
Add llm auto configurator and apply per seq sft loss for qwen2/2.5 models. [🔥🔥 2024.10.30]
Upgrade deepseek-v2-moe models to support MLA via transformer engine and pipeline ckpts conversion. [🔥🔥 2024.09.26]
Support training Qwen2.5 models by using Megatron-Core. [🔥🔥 2024.09.20]
Support Sequence Packing in SFT for Qwen2 and LLaMA 3.1 models. [🔥🔥 2024.09.13]
Upgrade qwen2 dense and moe models to support Flash-Attention 3, Offloading, Comm-Overlapping features. [🔥🔥 2024.08.26]
Support training LLaMA 3.1 dense models with Flash-Attention 3 backend. [🔥🔥 2024.08.23]
Support training LLaMA 3.1 dense models by using Megatron-Core. [🔥🔥 2024.08.23]
Support auto optimizer offloading in OffloadDistributedOptimizer. [🔥🔥 2024.07.25]
Support static optimizer offloading in OffloadDistributedOptimizer. [🔥🔥 2024.07.15]
Support training qwen2 moe models by using Megatron-Core. [🔥🔥 2024.06.19]
Support training qwen2 dense models by using Megatron-Core. [🔥🔥 2024.06.12]
Support training deepseek-v2-moe models by using Megatron-Core. [🔥🔥 2024.05.30]
Support training qwen1.5-moe models by using Megatron-Core. [🔥🔥 2024.05.13]
Support training llama3 models by using Megatron-LM and Megatron-Core. [🔥🔥 2024.04.21]
Support training qwen1.5 models by using Megatron-Core. [🔥🔥 2024.03.20]
Support training qwen1.5 models by using Megatron-LM. [🔥🔥 2024.02.28]
Support training mixtral-8x7b moe model by using Megatron-Core. [🔥🔥 2024.01.26]
Support training qwen-vl multimodel by using Megatron-LM. [🔥🔥 2023.12.15]
Support training LLava multimodel by using Megatron-LM. [🔥🔥 2023.12.01]
Support training deepseek model by using Megatron-LM. [🔥🔥 2023.11.24]
Support training qwen-72B model by using Megatron-LM. [🔥🔥 2023.11.23]
Support training Mistral-7B, Yi-6B and Codellama-34B [🔥🔥 2023.11.16]
Upgrade Megatron-LM for Llama2, qwen and baichuan2 to use transformer engine and fp8. [🔥🔥 2023.10.19]
Support training qwen-14B and baichuan2-13B model by using Megatron-LM. [🔥🔥 2023.10.08]

Highlights

Pai-Megatron-Patch is developed by the Alibaba Cloud Machine Learning Platform (PAI) algorithm team. The tool aims to assist developers in quickly getting started with Lingjun products and completing the entire development pipeline for LLM, including efficient distributed training, supervised fine-tuning, and offline model inference or verification. It has several merits as follows:

Support for multiple commonly used LLM such as llama, llama-2, codellama, deepseek, baichuan, qwen, Falcon, GLM, Starcoder, Bloom, chatglm, etc.
Support for model weight conversion: Mapping operator namespaces between Huggingface, Megatron, and Transformer Engine.
Support for FP8 training acceleration in Flash Attention 2.0 and Transformer Engine modes, ensuring training convergence.
Rich and user-friendly usage examples, offering best practices for the entire workflow of LLM pre-training, fine-tuning, evaluation, and inference, as well as reinforcement learning.

Framework

The design philosophy of Pai-Megatron-Patch is to avoid invasive modifications to the source code of Megatron-LM. In other words, it does not add new modules directly to Megatron-LM. Instead, the functions that need expansion and improvement are presented in the form of patch. This decoupling ensures that users can continue to embrace the best practices of LLM without being affected by upgrades of Megatron-LM.

Pai-Megatron-Patch includes key components for building LLM training, such as model library, tokenizers, model convertors, reinforcement learning , offline text generation, usages examples, and toolkits. The model library provides popular LLMs implemented in Megatron, such as baichuan, bloom, chatglm, falcon, galactica, glm, llama, qwen, and starcoder. More Megatron-based implementations of LLMs will be added as needed in the future. Additionally, the patch provides bidirectional conversion between Huggingface and Megatron model weights. This allows users to easily utilize Huggingface pretrained models for continued pre-training or fine-tuning in Megatron, as well as evaluating model quality using Huggingface's evaluation/inference pipelines on trained Megatron models.

In the reinforcement learning section, the patch offers PPO training workflows, enabling users to perform reinforcement learning with SFT models and RM models. Finally, the patch provides numerous usage examples to help users quickly start LLMs training and offline inference. For specific usage processes within Alibaba Cloud Lingjun products, please refer to the following link: PAI-Lingjun Intelligent Computing Service LLM solution.

Technical Reports

Contact

Use Dingtalk to scan blow QR code.

Note: group 1 is full, please add group 2.

License

This project is licensed under the Apache License (Version 2.0). This toolkit also contains some code modified from other repos under other open-source licenses. See the NOTICE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
Bigcode-Evaluation-Harness-240327 @ c6d3cd4		Bigcode-Evaluation-Harness-240327 @ c6d3cd4
LM-Evaluation-Harness-240310 @ 7d9bc88		LM-Evaluation-Harness-240310 @ 7d9bc88
Megatron-LM-231007 @ f772743		Megatron-LM-231007 @ f772743
Megatron-LM-240126 @ 3709708		Megatron-LM-240126 @ 3709708
Megatron-LM-240405 @ ba77325		Megatron-LM-240405 @ ba77325
Megatron-LM-241113 @ 64cbae5		Megatron-LM-241113 @ 64cbae5
Megatron-LM-MegaBlocks @ be4e23d		Megatron-LM-MegaBlocks @ be4e23d
PAI-Megatron-LM-240718 @ 7765c38		PAI-Megatron-LM-240718 @ 7765c38
examples		examples
megatron_patch		megatron_patch
rlhf		rlhf
toolkits		toolkits
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
README_zh-CN.md		README_zh-CN.md
patch.png		patch.png
qr.png		qr.png
qr2.png		qr2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Introduction

Highlights

Framework

Technical Reports

Contact

License

About

Releases 20

Packages

Contributors 29

Languages

License

alibaba/Pai-Megatron-Patch

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Introduction

Highlights

Framework

Technical Reports

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 29

Languages

Packages