patch for megatron core #889

koutianzhengwen · 2025-10-30T07:45:16Z

patch for megatron core

gemini-code-assist · 2025-10-30T07:48:18Z

Summary of Changes

Hello @koutianzhengwen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the FlagScale framework by integrating and optimizing the Qwen2.5-VL multimodal model for training on Huawei Atlas800TA3 hardware, leveraging the Megatron-LM framework. It involves a major update to backend dependencies, fine-tuning of training configurations for specific models, and the introduction of a new, extensive data loading and processing framework called 'Energon'. These changes collectively aim to improve the efficiency and capability of multimodal model training on the target hardware.

Highlights

Qwen2.5-VL Model Integration: The pull request updates the core configuration to explicitly target the Qwen2.5-VL multimodal model for training, replacing previous models like kimi_k2 and qwen3.
Megatron-LM Backend Adoption: The backend commit and version information are updated to reflect a shift towards Megatron-LM as the primary framework, moving away from vllm.
Huawei Atlas800TA3 Optimization: Training configurations for Aquila and Qwen2.5-VL models are specifically adjusted to leverage Huawei Atlas800TA3 hardware, including increased process and CUDA device counts, and the introduction of NPU-specific adaptors within Megatron-LM.
Introduction of Energon Data Framework: A comprehensive new data loading and processing framework named 'Energon' is integrated. This framework provides advanced capabilities for handling multimodal data (images, videos, text), efficient data loading, checkpointing, and distributed processing, which are crucial for large-scale multimodal model training.
Refined Training Parameters: Significant modifications are made to the training hyperparameters for the Aquila 7B model, such as increasing tensor model parallelism, reducing sequence length and batch sizes, and adjusting learning rate schedules, indicating a fine-tuning of training strategies.
Custom NPU Operations: New custom operations like npu_matmul_add and npu_rotary_position_embedding are introduced, demonstrating hardware-specific optimizations tailored for Huawei NPUs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a substantial set of patches to enable Megatron-Core support on Huawei Atlas hardware. The changes are extensive, adding numerous new files for hardware adaptation, custom operators, and monkey-patching core components. While this appears to be a significant step towards hardware compatibility, the current state of the code includes several areas that need improvement. I've noticed hardcoded paths in configuration files, commented-out code, leftover debugging statements, and build artifacts checked into the repository. More critically, there are instances of hardcoded device names (cuda, cpu) that undermine the goal of NPU support, duplicate files and function definitions, and the use of global variables for state management, which can lead to maintainability issues. Addressing these points will greatly improve the quality and robustness of this patch.

gemini-code-assist · 2025-10-30T07:52:13Z

hardware/Huawei_Atlas800TA3/FlagScale/kernel_meta/buildPidInfo.json.patch

+diff --git a/kernel_meta/buildPidInfo.json b/kernel_meta/buildPidInfo.json
+new file mode 100644
+index 00000000..1dd7a9ce
+--- /dev/null
+++ b/kernel_meta/buildPidInfo.json
+@@ -0,0 +1,258 @@
+[
+    [
+        3542810,
+        "/home/FlagScale/kernel_meta/kernel_meta_1962833800725725517"
+    ],
+    [
+        3542811,
+        "/home/FlagScale/kernel_meta/kernel_meta_5135491729837132746"
+    ],
+    [
+        3542812,
+        "/home/FlagScale/kernel_meta/kernel_meta_201971367090549155"
+    ],
+    [
+        3542813,
+        "/home/FlagScale/kernel_meta/kernel_meta_2680602631233265554"
+    ],
+    [
+        3542814,
+        "/home/FlagScale/kernel_meta/kernel_meta_6022598180981049575"
+    ],
+    [
+        3542815,
+        "/home/FlagScale/kernel_meta/kernel_meta_13861986098675162229"
+    ],
+    [
+        3542816,
+        "/home/FlagScale/kernel_meta/kernel_meta_12585153093497637559"
+    ],
+    [
+        3542817,
+        "/home/FlagScale/kernel_meta/kernel_meta_12621055522972314439"
+    ],
+    [
+        3542818,
+        "/home/FlagScale/kernel_meta/kernel_meta_7901323918035740809"
+    ],
+    [
+        3542819,
+        "/home/FlagScale/kernel_meta/kernel_meta_2502011687513636034"
+    ],
+    [
+        3542820,
+        "/home/FlagScale/kernel_meta/kernel_meta_11127660572711503554"
+    ],
+    [
+        3542821,
+        "/home/FlagScale/kernel_meta/kernel_meta_16899104921307257912"
+    ],
+    [
+        3542822,
+        "/home/FlagScale/kernel_meta/kernel_meta_12168969600042534831"
+    ],
+    [
+        3542823,
+        "/home/FlagScale/kernel_meta/kernel_meta_5947623299434597152"
+    ],
+    [
+        3542824,
+        "/home/FlagScale/kernel_meta/kernel_meta_8669128749968193072"
+    ],
+    [
+        3542825,
+        "/home/FlagScale/kernel_meta/kernel_meta_10523056875872612535"
+    ],
+    [
+        3555941,
+        "/home/FlagScale/kernel_meta/kernel_meta_8521696946960939765"
+    ],
+    [
+        3555942,
+        "/home/FlagScale/kernel_meta/kernel_meta_16892975521147338452"
+    ],
+    [
+        3555943,
+        "/home/FlagScale/kernel_meta/kernel_meta_16944483673796156521"
+    ],
+    [
+        3555944,
+        "/home/FlagScale/kernel_meta/kernel_meta_2146335355416727975"
+    ],
+    [
+        3555945,
+        "/home/FlagScale/kernel_meta/kernel_meta_10692146212861378037"
+    ],
+    [
+        3555946,
+        "/home/FlagScale/kernel_meta/kernel_meta_6599005866133998198"
+    ],
+    [
+        3555947,
+        "/home/FlagScale/kernel_meta/kernel_meta_3524304607908504615"
+    ],
+    [
+        3555948,
+        "/home/FlagScale/kernel_meta/kernel_meta_1301410561755749912"
+    ],
+    [
+        3555949,
+        "/home/FlagScale/kernel_meta/kernel_meta_16611153486096822942"
+    ],
+    [
+        3555950,
+        "/home/FlagScale/kernel_meta/kernel_meta_16897800069514034067"
+    ],
+    [
+        3555951,
+        "/home/FlagScale/kernel_meta/kernel_meta_5585037901002416772"
+    ],
+    [
+        3555952,
+        "/home/FlagScale/kernel_meta/kernel_meta_15103467739124890334"
+    ],
+    [
+        3555953,
+        "/home/FlagScale/kernel_meta/kernel_meta_6258222255360935812"
+    ],
+    [
+        3555954,
+        "/home/FlagScale/kernel_meta/kernel_meta_4842074883042507385"
+    ],
+    [
+        3555955,
+        "/home/FlagScale/kernel_meta/kernel_meta_18443539599822282991"
+    ],
+    [
+        3555956,
+        "/home/FlagScale/kernel_meta/kernel_meta_990735911844063815"
+    ],
+    [
+        3569115,
+        "/home/FlagScale/kernel_meta/kernel_meta_14128363198148798978"
+    ],
+    [
+        3569116,
+        "/home/FlagScale/kernel_meta/kernel_meta_15248158307555883079"
+    ],
+    [
+        3569117,
+        "/home/FlagScale/kernel_meta/kernel_meta_13697321207211618499"
+    ],
+    [
+        3569118,
+        "/home/FlagScale/kernel_meta/kernel_meta_17667290731393060821"
+    ],
+    [
+        3569119,
+        "/home/FlagScale/kernel_meta/kernel_meta_3127781752883540278"
+    ],
+    [
+        3569120,
+        "/home/FlagScale/kernel_meta/kernel_meta_15116490241101309573"
+    ],
+    [
+        3569121,
+        "/home/FlagScale/kernel_meta/kernel_meta_7563172329201413877"
+    ],
+    [
+        3569122,
+        "/home/FlagScale/kernel_meta/kernel_meta_4105781977409166076"
+    ],
+    [
+        3569123,
+        "/home/FlagScale/kernel_meta/kernel_meta_15637440142566643625"
+    ],
+    [
+        3569124,
+        "/home/FlagScale/kernel_meta/kernel_meta_4316115201839818591"
+    ],
+    [
+        3569125,
+        "/home/FlagScale/kernel_meta/kernel_meta_15623411163447736831"
+    ],
+    [
+        3569126,
+        "/home/FlagScale/kernel_meta/kernel_meta_12529375583352343312"
+    ],
+    [
+        3569127,
+        "/home/FlagScale/kernel_meta/kernel_meta_5457735498648893664"
+    ],
+    [
+        3569128,
+        "/home/FlagScale/kernel_meta/kernel_meta_1796251299214757029"
+    ],
+    [
+        3569129,
+        "/home/FlagScale/kernel_meta/kernel_meta_14086212340752724863"
+    ],
+    [
+        3569130,
+        "/home/FlagScale/kernel_meta/kernel_meta_7879416462992599489"
+    ],
+    [
+        3582172,
+        "/home/FlagScale/kernel_meta/kernel_meta_9692422471699391823"
+    ],
+    [
+        3582173,
+        "/home/FlagScale/kernel_meta/kernel_meta_6831436488153844847"
+    ],
+    [
+        3582174,
+        "/home/FlagScale/kernel_meta/kernel_meta_16380851789988553950"
+    ],
+    [
+        3582175,
+        "/home/FlagScale/kernel_meta/kernel_meta_10176872202077569091"
+    ],
+    [
+        3582176,
+        "/home/FlagScale/kernel_meta/kernel_meta_9244506976584338889"
+    ],
+    [
+        3582177,
+        "/home/FlagScale/kernel_meta/kernel_meta_2583028720151200939"
+    ],
+    [
+        3582178,
+        "/home/FlagScale/kernel_meta/kernel_meta_8204106850562333788"
+    ],
+    [
+        3582179,
+        "/home/FlagScale/kernel_meta/kernel_meta_15430286334742802733"
+    ],
+    [
+        3582180,
+        "/home/FlagScale/kernel_meta/kernel_meta_17315954492644073688"
+    ],
+    [
+        3582181,
+        "/home/FlagScale/kernel_meta/kernel_meta_2142962576366369287"
+    ],
+    [
+        3582182,
+        "/home/FlagScale/kernel_meta/kernel_meta_1014042173593524031"
+    ],
+    [
+        3582183,
+        "/home/FlagScale/kernel_meta/kernel_meta_7944419661808782839"
+    ],
+    [
+        3582184,
+        "/home/FlagScale/kernel_meta/kernel_meta_17121113405440079511"
+    ],
+    [
+        3582185,
+        "/home/FlagScale/kernel_meta/kernel_meta_15163068314271812870"
+    ],
+    [
+        3582186,
+        "/home/FlagScale/kernel_meta/kernel_meta_3434065023340759447"
+    ],
+    [
+        3582187,
+        "/home/FlagScale/kernel_meta/kernel_meta_2770615037130619961"
+    ]
+]
+\ No newline at end of file
+


This file appears to contain build artifacts (process IDs and kernel metadata paths). Such files are typically specific to a single build environment and should not be committed to the repository. Please remove this file and add the kernel_meta/ directory to your .gitignore file.

gemini-code-assist · 2025-10-30T07:52:14Z

...wei_Atlas800TA3/FlagScale/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak.patch

+diff --git a/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak b/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak
+new file mode 100644
+index 00000000..3866741b
+--- /dev/null
+++ b/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak
+@@ -0,0 +1,38 @@
+import torch
+import torch_npu
+from ..op_builder.matmul_add_builder import MatmulAddOpBuilder
+
+__all__ = ["npu_matmul_add_fp32"]
+
+
+matmul_add_op_builder = MatmulAddOpBuilder()
+
+def npu_matmul_add_fp32(total_input, grad_output, grad):
+    # 检查total_input的shape是否有维度为0
+    for dim in total_input.shape:
+        if dim == 0:
+            return
+
+    # 检查grad_output的shape是否有维度为0
+    for dim in grad_output.shape:
+        if dim == 0:
+            return
+
+    matmul_add_ops = matmul_add_op_builder.load()
+    matmul_add_ops.npu_matmul_add_fp32(grad_output, total_input, grad)
+
+
+def npu_matmul_add_fp16(total_input, grad_output, grad):
+    # 检查total_input的shape是否有维度为0
+    for dim in total_input.shape:
+        if dim == 0:
+            return
+
+    # 检查grad_output的shape是否有维度为0
+    for dim in grad_output.shape:
+        if dim == 0:
+            return
+            
+    grad_weight = grad_output.t().matmul(total_input)
+    grad.add_(grad_weight)
+
+


This file appears to be a backup file (.bak extension). Backup files and other temporary artifacts should not be included in the version control system. Please remove it.

gemini-code-assist · 2025-10-30T07:52:14Z

...wei_Atlas800TA3/FlagScale/flagscale/train/megatron_basic_adaptor/requirements_basic.py.patch

+def dummy_compile(*args, **kwargs):
+    if len(args) > 0 and callable(args[0]):
+        def wrapper(*fn_args, **fn_kwargs):
+            return args[0](*fn_args, **fn_kwargs)
+        return wrapper
+    else:
+        def compile_wrapper(fn):
+            def wrapper(*fn_args, **fn_kwargs):
+                return fn(*fn_args, **fn_kwargs)
+            return wrapper
+        return compile_wrapper


The function dummy_compile is defined twice in this file. This is redundant and can lead to confusion. Please remove the duplicate definition.

gemini-code-assist · 2025-10-30T07:52:14Z

hardware/Huawei_Atlas800TA3/FlagScale/flagscale/train/patch_utils.py.patch

+diff --git a/flagscale/train/patch_utils.py b/flagscale/train/patch_utils.py
+new file mode 100644
+index 00000000..a0890867
+--- /dev/null
+++ b/flagscale/train/patch_utils.py
+@@ -0,0 +1,174 @@
+import importlib
+import sys
+import types
+from typing import List, Dict, Union
+
+
+def get_func_name(func):
+    if isinstance(func, str):
+        return func
+    return '.'.join((func.__module__, func.__qualname__))
+
+
+def dummy_function_wrapper(func_name):
+    def dummy_function(*args, **kwargs):
+        raise RuntimeError('function {} no exist'.format(func_name))
+
+    return dummy_function
+
+
+class Patch:
+    def __init__(self, orig_func_name, new_func, create_dummy):
+        split_name = orig_func_name.rsplit('.', 1)
+        if len(split_name) == 1:
+            self.orig_module_name, self.orig_func_name = orig_func_name, None
+        else:
+            self.orig_module_name, self.orig_func_name = split_name
+        self.orig_module = None
+        self.orig_func = None
+
+        self.patch_func = None
+        self.wrappers = []
+        if new_func is None:
+            new_func = dummy_function_wrapper(orig_func_name)
+        self.set_patch_func(new_func)
+        self.is_applied = False
+        self.create_dummy = create_dummy
+
+    @property
+    def orig_func_id(self):
+        return id(self.orig_func)
+
+    @property
+    def patch_func_id(self):
+        return id(self.patch_func)
+
+    def set_patch_func(self, new_func, force_patch=False):
+        if hasattr(new_func, '__name__') and new_func.__name__.endswith(('wrapper', 'decorator')):
+            if new_func not in self.wrappers:
+                self.wrappers.append(new_func)
+        else:
+            if self.patch_func and not force_patch:
+                raise RuntimeError('the patch of {} exist !'.format(self.orig_func_name))
+            self.patch_func = new_func
+        self.is_applied = False
+
+    def remove_wrappers(self, wrapper_names: Union[str, List[str]] = None):
+        if wrapper_names is None:
+            self.wrappers.clear()
+            return
+
+        if isinstance(wrapper_names, str):
+            wrapper_names = [wrapper_names]
+        for name in wrapper_names:
+            i = 0
+            while i < len(self.wrappers):
+                if self.wrappers[i].__name__ == name:
+                    self.wrappers.pop(i)
+                else:
+                    i += 1
+
+    def apply_patch(self):
+        if self.is_applied:
+            return
+
+        self.orig_module, self.orig_func = Patch.parse_path(
+            self.orig_module_name, self.orig_func_name, self.create_dummy
+        )
+
+        final_patch_func = self.orig_func
+        if self.patch_func is not None:
+            final_patch_func = self.patch_func
+
+        for wrapper in self.wrappers:
+            final_patch_func = wrapper(final_patch_func)
+
+        if self.orig_func_name is not None:
+            setattr(self.orig_module, self.orig_func_name, final_patch_func)
+        for key, value in sys.modules.copy().items():
+            if (
+                self.orig_func_name is not None
+                and hasattr(value, self.orig_func_name)
+                and id(getattr(value, self.orig_func_name)) == self.orig_func_id
+            ):
+                setattr(value, self.orig_func_name, final_patch_func)
+        self.is_applied = True
+
+    @staticmethod
+    def parse_path(module_path, function_name, create_dummy):
+        from importlib.machinery import ModuleSpec
+
+        modules = module_path.split('.')
+        for i in range(1, len(modules) + 1):
+            parent = '.'.join(modules[: i - 1])
+            path = '.'.join(modules[:i])
+            try:
+                importlib.import_module(path)
+            except ModuleNotFoundError as e:
+                if not parent or not hasattr(importlib.import_module(parent), modules[i - 1]):
+                    if not create_dummy:
+                        raise ModuleNotFoundError(e) from e
+                    sys.modules[path] = types.ModuleType(path)
+                    sys.modules[path].__file__ = 'mindspeed.dummy_module.py'
+                    sys.modules[path].__spec__ = ModuleSpec(path, None)
+                    if parent:
+                        setattr(importlib.import_module(parent), modules[i - 1], sys.modules[path])
+                else:
+                    module = getattr(importlib.import_module(parent), modules[i - 1])
+                    if hasattr(module, function_name):
+                        return module, getattr(module, function_name)
+                    elif create_dummy:
+                        return module, dummy_function_wrapper(function_name)
+                    else:
+                        raise RuntimeError('no exist {} of {}'.format(function_name, module))
+
+        if function_name is not None and not hasattr(sys.modules[module_path], function_name):
+            setattr(sys.modules[module_path], function_name, None)
+        return sys.modules[module_path], (
+            getattr(sys.modules[module_path], function_name) if function_name is not None else None
+        )
+
+
+class MindSpeedPatchesManager:
+    patches_info: Dict[str, Patch] = {}
+
+    @staticmethod
+    def register_patch(orig_func_name, new_func=None, force_patch=False, create_dummy=False):
+        """Patch registration method. When this method is executed, the patch does not take effect in real time.
+        It takes effect only after the apply_patches method is invoked. Other details are as follows:
+
+        1. If `orig_func_name` does not exist and create_dummy is set to True, a dummy function is created to ensure
+        that the import is normal.
+        2. If `orig_func_name` is not None, `orig_func_name` is replaced with `new_func`.
+        3. If the `new_func` function name ends with `wrapper` or `decorator`, then `new_func` is decorated on
+        `orig_func_name` as a decorator, and the decorator can be superimposed repeatedly.
+        4. When force_patch=False, a function cannot be replaced repeatedly (but can be decorated repeatedly),
+        otherwise the replacement is overwritten.
+        """
+        if orig_func_name not in MindSpeedPatchesManager.patches_info:
+            MindSpeedPatchesManager.patches_info[orig_func_name] = Patch(
+                orig_func_name, new_func, create_dummy
+            )
+        else:
+            MindSpeedPatchesManager.patches_info.get(orig_func_name).set_patch_func(
+                new_func, force_patch
+            )
+
+    @staticmethod
+    def remove_wrappers(orig_func_name, wrappers_name, remove_check=True):
+        """Remove wrapper registered in orig_func_name."""
+        if orig_func_name not in MindSpeedPatchesManager.patches_info:
+            raise ValueError('The function <{}> not exist.'.format(orig_func_name))
+
+        patch = MindSpeedPatchesManager.patches_info.get(orig_func_name)
+        wrappers_len = len(patch.wrappers)
+        patch.remove_wrappers(wrappers_name)
+        if remove_check and wrappers_len == len(patch.wrappers):
+            raise RuntimeError('Remove wrappers has not remove anything.')
+
+    @staticmethod
+    def apply_patches():
+        """Apply all patches registered in MindSpeedPatchesManager."""
+        for patch in MindSpeedPatchesManager.patches_info.values():
+            patch.apply_patch()
+
+


This file seems to be a duplicate of flagscale/train/megatron_basic_adaptor/patch_utils.py. Having duplicate files increases maintenance overhead. Please remove one of them and adjust the imports accordingly.

gemini-code-assist · 2025-10-30T07:52:14Z

hardware/Huawei_Atlas800TA3/Megatron-LM/megatron/core/transformer/transformer_block.py.patch

+        # TODO: @aoyulong - This is a temporary solution to support single-file-per-tensor ckpt
+        non_homogeneous_layers_env = os.getenv('FS_NON_HOMOGENEOUS_LAYERS', 'False').lower() in (
+            'true',
+            '1',
+            't',
+        )
+        if non_homogeneous_layers_env:
+            non_homogeneous_layers = True


Using an environment variable (FS_NON_HOMOGENEOUS_LAYERS) to control model architecture at runtime is not ideal as it's an implicit dependency. This should be an explicit configuration option in TransformerConfig for better clarity and control.

gemini-code-assist · 2025-10-30T07:52:17Z

hardware/Huawei_Atlas800TA3/FlagScale/flagscale/dataset_helpers.py.patch

+            ## NOTE(lizhiyu): Force set system Prompt: "You are a helpful assistant."
+            # converted_conversation.append(
+            #     {"role": "system", "content": "You are a helpful assistant."}
+            # )


This block of commented-out code should be removed to improve code clarity.

gemini-code-assist · 2025-10-30T07:52:17Z

hardware/Huawei_Atlas800TA3/FlagScale/flagscale/dataset_helpers.py.patch

+                        f"Failed to open image: {img_path}. Error: {e} of smaple[{sample.__key__}]"
+                    )
+                    # raise InternalWarning(
+                    #     f"Failed to open image: {img_path}. Error: {e} of smaple[{sample.__key__}]"
+                    # )


There's a typo 'smaple' in the error message. Additionally, the commented-out InternalWarning should be removed to keep the code clean.

f"Failed to open image: {img_path}. Error: {e} of sample[{sample.__key__}]" )

gemini-code-assist · 2025-10-30T07:52:17Z

hardware/Huawei_Atlas800TA3/FlagScale/examples/qwen2_5_vl/conf/train/7b.yaml.patch

+  data_path: /home/dataset/sample_dataset/sample_dataset/wds-1
+  vision_root: /home/dataset/sample_dataset/sample_dataset/
+   dataloader_type: external
+   split: 100,0,0
+   tokenizer:
+     tokenizer_type: Qwen2VLTokenizer
+-    tokenizer_path: xxxx
+    tokenizer_path: /home/dataset/Qwen2.5-VL-7B-Instruct


Hardcoding paths for data_path, vision_root, and tokenizer_path reduces the portability of this configuration. Consider using environment variables or placeholders to make these paths configurable.

gemini-code-assist · 2025-10-30T07:52:17Z

hardware/Huawei_Atlas800TA3/Megatron-LM/megatron/core/transformer/moe/experts.py.patch

+        # NOTE(lizhiyu): The following code is for hetro-expert training when one of the expert parallel degree is 1.
+        #                 But there are other codes need to be modified to make it work.
+        # if config.enable_hetero:
+        #     setattr(self.weight1, 'allreduce', False)


This block of commented-out code seems to be for future work or debugging. It should be removed to keep the codebase clean.

gemini-code-assist · 2025-10-30T07:52:17Z

hardware/Huawei_Atlas800TA3/FlagScale/examples/aquila/conf/train/7b.yaml.patch

+
+ data:
+-  data_path: ${data_path:??}
+  data_path: /home/dataset/pile_wikipedia_demo


Hardcoding the data_path makes the configuration less portable. It would be better to use an environment variable or a placeholder, similar to the original implementation (${data_path:??}), to allow for easier configuration across different environments.

data_path: ${DATA_PATH:/home/dataset/pile_wikipedia_demo}

CLAassistant · 2025-11-18T13:07:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

zhaoxuncheng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

patch for megatron core

07b52cc

koutianzhengwen requested a review from Caozhou1995 as a code owner October 30, 2025 07:45

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

patch for megatron core #889

Are you sure you want to change the base?

patch for megatron core #889

Conversation

koutianzhengwen commented Oct 30, 2025

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants