Skip to content

Conversation

@koutianzhengwen
Copy link

patch for megatron core

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @koutianzhengwen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the FlagScale framework by integrating and optimizing the Qwen2.5-VL multimodal model for training on Huawei Atlas800TA3 hardware, leveraging the Megatron-LM framework. It involves a major update to backend dependencies, fine-tuning of training configurations for specific models, and the introduction of a new, extensive data loading and processing framework called 'Energon'. These changes collectively aim to improve the efficiency and capability of multimodal model training on the target hardware.

Highlights

  • Qwen2.5-VL Model Integration: The pull request updates the core configuration to explicitly target the Qwen2.5-VL multimodal model for training, replacing previous models like kimi_k2 and qwen3.
  • Megatron-LM Backend Adoption: The backend commit and version information are updated to reflect a shift towards Megatron-LM as the primary framework, moving away from vllm.
  • Huawei Atlas800TA3 Optimization: Training configurations for Aquila and Qwen2.5-VL models are specifically adjusted to leverage Huawei Atlas800TA3 hardware, including increased process and CUDA device counts, and the introduction of NPU-specific adaptors within Megatron-LM.
  • Introduction of Energon Data Framework: A comprehensive new data loading and processing framework named 'Energon' is integrated. This framework provides advanced capabilities for handling multimodal data (images, videos, text), efficient data loading, checkpointing, and distributed processing, which are crucial for large-scale multimodal model training.
  • Refined Training Parameters: Significant modifications are made to the training hyperparameters for the Aquila 7B model, such as increasing tensor model parallelism, reducing sequence length and batch sizes, and adjusting learning rate schedules, indicating a fine-tuning of training strategies.
  • Custom NPU Operations: New custom operations like npu_matmul_add and npu_rotary_position_embedding are introduced, demonstrating hardware-specific optimizations tailored for Huawei NPUs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a substantial set of patches to enable Megatron-Core support on Huawei Atlas hardware. The changes are extensive, adding numerous new files for hardware adaptation, custom operators, and monkey-patching core components. While this appears to be a significant step towards hardware compatibility, the current state of the code includes several areas that need improvement. I've noticed hardcoded paths in configuration files, commented-out code, leftover debugging statements, and build artifacts checked into the repository. More critically, there are instances of hardcoded device names (cuda, cpu) that undermine the goal of NPU support, duplicate files and function definitions, and the use of global variables for state management, which can lead to maintainability issues. Addressing these points will greatly improve the quality and robustness of this patch.

Comment on lines +1 to +266
diff --git a/kernel_meta/buildPidInfo.json b/kernel_meta/buildPidInfo.json
new file mode 100644
index 00000000..1dd7a9ce
--- /dev/null
+++ b/kernel_meta/buildPidInfo.json
@@ -0,0 +1,258 @@
+[
+ [
+ 3542810,
+ "/home/FlagScale/kernel_meta/kernel_meta_1962833800725725517"
+ ],
+ [
+ 3542811,
+ "/home/FlagScale/kernel_meta/kernel_meta_5135491729837132746"
+ ],
+ [
+ 3542812,
+ "/home/FlagScale/kernel_meta/kernel_meta_201971367090549155"
+ ],
+ [
+ 3542813,
+ "/home/FlagScale/kernel_meta/kernel_meta_2680602631233265554"
+ ],
+ [
+ 3542814,
+ "/home/FlagScale/kernel_meta/kernel_meta_6022598180981049575"
+ ],
+ [
+ 3542815,
+ "/home/FlagScale/kernel_meta/kernel_meta_13861986098675162229"
+ ],
+ [
+ 3542816,
+ "/home/FlagScale/kernel_meta/kernel_meta_12585153093497637559"
+ ],
+ [
+ 3542817,
+ "/home/FlagScale/kernel_meta/kernel_meta_12621055522972314439"
+ ],
+ [
+ 3542818,
+ "/home/FlagScale/kernel_meta/kernel_meta_7901323918035740809"
+ ],
+ [
+ 3542819,
+ "/home/FlagScale/kernel_meta/kernel_meta_2502011687513636034"
+ ],
+ [
+ 3542820,
+ "/home/FlagScale/kernel_meta/kernel_meta_11127660572711503554"
+ ],
+ [
+ 3542821,
+ "/home/FlagScale/kernel_meta/kernel_meta_16899104921307257912"
+ ],
+ [
+ 3542822,
+ "/home/FlagScale/kernel_meta/kernel_meta_12168969600042534831"
+ ],
+ [
+ 3542823,
+ "/home/FlagScale/kernel_meta/kernel_meta_5947623299434597152"
+ ],
+ [
+ 3542824,
+ "/home/FlagScale/kernel_meta/kernel_meta_8669128749968193072"
+ ],
+ [
+ 3542825,
+ "/home/FlagScale/kernel_meta/kernel_meta_10523056875872612535"
+ ],
+ [
+ 3555941,
+ "/home/FlagScale/kernel_meta/kernel_meta_8521696946960939765"
+ ],
+ [
+ 3555942,
+ "/home/FlagScale/kernel_meta/kernel_meta_16892975521147338452"
+ ],
+ [
+ 3555943,
+ "/home/FlagScale/kernel_meta/kernel_meta_16944483673796156521"
+ ],
+ [
+ 3555944,
+ "/home/FlagScale/kernel_meta/kernel_meta_2146335355416727975"
+ ],
+ [
+ 3555945,
+ "/home/FlagScale/kernel_meta/kernel_meta_10692146212861378037"
+ ],
+ [
+ 3555946,
+ "/home/FlagScale/kernel_meta/kernel_meta_6599005866133998198"
+ ],
+ [
+ 3555947,
+ "/home/FlagScale/kernel_meta/kernel_meta_3524304607908504615"
+ ],
+ [
+ 3555948,
+ "/home/FlagScale/kernel_meta/kernel_meta_1301410561755749912"
+ ],
+ [
+ 3555949,
+ "/home/FlagScale/kernel_meta/kernel_meta_16611153486096822942"
+ ],
+ [
+ 3555950,
+ "/home/FlagScale/kernel_meta/kernel_meta_16897800069514034067"
+ ],
+ [
+ 3555951,
+ "/home/FlagScale/kernel_meta/kernel_meta_5585037901002416772"
+ ],
+ [
+ 3555952,
+ "/home/FlagScale/kernel_meta/kernel_meta_15103467739124890334"
+ ],
+ [
+ 3555953,
+ "/home/FlagScale/kernel_meta/kernel_meta_6258222255360935812"
+ ],
+ [
+ 3555954,
+ "/home/FlagScale/kernel_meta/kernel_meta_4842074883042507385"
+ ],
+ [
+ 3555955,
+ "/home/FlagScale/kernel_meta/kernel_meta_18443539599822282991"
+ ],
+ [
+ 3555956,
+ "/home/FlagScale/kernel_meta/kernel_meta_990735911844063815"
+ ],
+ [
+ 3569115,
+ "/home/FlagScale/kernel_meta/kernel_meta_14128363198148798978"
+ ],
+ [
+ 3569116,
+ "/home/FlagScale/kernel_meta/kernel_meta_15248158307555883079"
+ ],
+ [
+ 3569117,
+ "/home/FlagScale/kernel_meta/kernel_meta_13697321207211618499"
+ ],
+ [
+ 3569118,
+ "/home/FlagScale/kernel_meta/kernel_meta_17667290731393060821"
+ ],
+ [
+ 3569119,
+ "/home/FlagScale/kernel_meta/kernel_meta_3127781752883540278"
+ ],
+ [
+ 3569120,
+ "/home/FlagScale/kernel_meta/kernel_meta_15116490241101309573"
+ ],
+ [
+ 3569121,
+ "/home/FlagScale/kernel_meta/kernel_meta_7563172329201413877"
+ ],
+ [
+ 3569122,
+ "/home/FlagScale/kernel_meta/kernel_meta_4105781977409166076"
+ ],
+ [
+ 3569123,
+ "/home/FlagScale/kernel_meta/kernel_meta_15637440142566643625"
+ ],
+ [
+ 3569124,
+ "/home/FlagScale/kernel_meta/kernel_meta_4316115201839818591"
+ ],
+ [
+ 3569125,
+ "/home/FlagScale/kernel_meta/kernel_meta_15623411163447736831"
+ ],
+ [
+ 3569126,
+ "/home/FlagScale/kernel_meta/kernel_meta_12529375583352343312"
+ ],
+ [
+ 3569127,
+ "/home/FlagScale/kernel_meta/kernel_meta_5457735498648893664"
+ ],
+ [
+ 3569128,
+ "/home/FlagScale/kernel_meta/kernel_meta_1796251299214757029"
+ ],
+ [
+ 3569129,
+ "/home/FlagScale/kernel_meta/kernel_meta_14086212340752724863"
+ ],
+ [
+ 3569130,
+ "/home/FlagScale/kernel_meta/kernel_meta_7879416462992599489"
+ ],
+ [
+ 3582172,
+ "/home/FlagScale/kernel_meta/kernel_meta_9692422471699391823"
+ ],
+ [
+ 3582173,
+ "/home/FlagScale/kernel_meta/kernel_meta_6831436488153844847"
+ ],
+ [
+ 3582174,
+ "/home/FlagScale/kernel_meta/kernel_meta_16380851789988553950"
+ ],
+ [
+ 3582175,
+ "/home/FlagScale/kernel_meta/kernel_meta_10176872202077569091"
+ ],
+ [
+ 3582176,
+ "/home/FlagScale/kernel_meta/kernel_meta_9244506976584338889"
+ ],
+ [
+ 3582177,
+ "/home/FlagScale/kernel_meta/kernel_meta_2583028720151200939"
+ ],
+ [
+ 3582178,
+ "/home/FlagScale/kernel_meta/kernel_meta_8204106850562333788"
+ ],
+ [
+ 3582179,
+ "/home/FlagScale/kernel_meta/kernel_meta_15430286334742802733"
+ ],
+ [
+ 3582180,
+ "/home/FlagScale/kernel_meta/kernel_meta_17315954492644073688"
+ ],
+ [
+ 3582181,
+ "/home/FlagScale/kernel_meta/kernel_meta_2142962576366369287"
+ ],
+ [
+ 3582182,
+ "/home/FlagScale/kernel_meta/kernel_meta_1014042173593524031"
+ ],
+ [
+ 3582183,
+ "/home/FlagScale/kernel_meta/kernel_meta_7944419661808782839"
+ ],
+ [
+ 3582184,
+ "/home/FlagScale/kernel_meta/kernel_meta_17121113405440079511"
+ ],
+ [
+ 3582185,
+ "/home/FlagScale/kernel_meta/kernel_meta_15163068314271812870"
+ ],
+ [
+ 3582186,
+ "/home/FlagScale/kernel_meta/kernel_meta_3434065023340759447"
+ ],
+ [
+ 3582187,
+ "/home/FlagScale/kernel_meta/kernel_meta_2770615037130619961"
+ ]
+]
\ No newline at end of file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This file appears to contain build artifacts (process IDs and kernel metadata paths). Such files are typically specific to a single build environment and should not be committed to the repository. Please remove this file and add the kernel_meta/ directory to your .gitignore file.

Comment on lines +1 to +45
diff --git a/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak b/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak
new file mode 100644
index 00000000..3866741b
--- /dev/null
+++ b/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak
@@ -0,0 +1,38 @@
+import torch
+import torch_npu
+from ..op_builder.matmul_add_builder import MatmulAddOpBuilder
+
+__all__ = ["npu_matmul_add_fp32"]
+
+
+matmul_add_op_builder = MatmulAddOpBuilder()
+
+def npu_matmul_add_fp32(total_input, grad_output, grad):
+ # 检查total_input的shape是否有维度为0
+ for dim in total_input.shape:
+ if dim == 0:
+ return
+
+ # 检查grad_output的shape是否有维度为0
+ for dim in grad_output.shape:
+ if dim == 0:
+ return
+
+ matmul_add_ops = matmul_add_op_builder.load()
+ matmul_add_ops.npu_matmul_add_fp32(grad_output, total_input, grad)
+
+
+def npu_matmul_add_fp16(total_input, grad_output, grad):
+ # 检查total_input的shape是否有维度为0
+ for dim in total_input.shape:
+ if dim == 0:
+ return
+
+ # 检查grad_output的shape是否有维度为0
+ for dim in grad_output.shape:
+ if dim == 0:
+ return
+
+ grad_weight = grad_output.t().matmul(total_input)
+ grad.add_(grad_weight)
+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This file appears to be a backup file (.bak extension). Backup files and other temporary artifacts should not be included in the version control system. Please remove it.

Comment on lines +108 to +118
+def dummy_compile(*args, **kwargs):
+ if len(args) > 0 and callable(args[0]):
+ def wrapper(*fn_args, **fn_kwargs):
+ return args[0](*fn_args, **fn_kwargs)
+ return wrapper
+ else:
+ def compile_wrapper(fn):
+ def wrapper(*fn_args, **fn_kwargs):
+ return fn(*fn_args, **fn_kwargs)
+ return wrapper
+ return compile_wrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function dummy_compile is defined twice in this file. This is redundant and can lead to confusion. Please remove the duplicate definition.

Comment on lines +1 to +181
diff --git a/flagscale/train/patch_utils.py b/flagscale/train/patch_utils.py
new file mode 100644
index 00000000..a0890867
--- /dev/null
+++ b/flagscale/train/patch_utils.py
@@ -0,0 +1,174 @@
+import importlib
+import sys
+import types
+from typing import List, Dict, Union
+
+
+def get_func_name(func):
+ if isinstance(func, str):
+ return func
+ return '.'.join((func.__module__, func.__qualname__))
+
+
+def dummy_function_wrapper(func_name):
+ def dummy_function(*args, **kwargs):
+ raise RuntimeError('function {} no exist'.format(func_name))
+
+ return dummy_function
+
+
+class Patch:
+ def __init__(self, orig_func_name, new_func, create_dummy):
+ split_name = orig_func_name.rsplit('.', 1)
+ if len(split_name) == 1:
+ self.orig_module_name, self.orig_func_name = orig_func_name, None
+ else:
+ self.orig_module_name, self.orig_func_name = split_name
+ self.orig_module = None
+ self.orig_func = None
+
+ self.patch_func = None
+ self.wrappers = []
+ if new_func is None:
+ new_func = dummy_function_wrapper(orig_func_name)
+ self.set_patch_func(new_func)
+ self.is_applied = False
+ self.create_dummy = create_dummy
+
+ @property
+ def orig_func_id(self):
+ return id(self.orig_func)
+
+ @property
+ def patch_func_id(self):
+ return id(self.patch_func)
+
+ def set_patch_func(self, new_func, force_patch=False):
+ if hasattr(new_func, '__name__') and new_func.__name__.endswith(('wrapper', 'decorator')):
+ if new_func not in self.wrappers:
+ self.wrappers.append(new_func)
+ else:
+ if self.patch_func and not force_patch:
+ raise RuntimeError('the patch of {} exist !'.format(self.orig_func_name))
+ self.patch_func = new_func
+ self.is_applied = False
+
+ def remove_wrappers(self, wrapper_names: Union[str, List[str]] = None):
+ if wrapper_names is None:
+ self.wrappers.clear()
+ return
+
+ if isinstance(wrapper_names, str):
+ wrapper_names = [wrapper_names]
+ for name in wrapper_names:
+ i = 0
+ while i < len(self.wrappers):
+ if self.wrappers[i].__name__ == name:
+ self.wrappers.pop(i)
+ else:
+ i += 1
+
+ def apply_patch(self):
+ if self.is_applied:
+ return
+
+ self.orig_module, self.orig_func = Patch.parse_path(
+ self.orig_module_name, self.orig_func_name, self.create_dummy
+ )
+
+ final_patch_func = self.orig_func
+ if self.patch_func is not None:
+ final_patch_func = self.patch_func
+
+ for wrapper in self.wrappers:
+ final_patch_func = wrapper(final_patch_func)
+
+ if self.orig_func_name is not None:
+ setattr(self.orig_module, self.orig_func_name, final_patch_func)
+ for key, value in sys.modules.copy().items():
+ if (
+ self.orig_func_name is not None
+ and hasattr(value, self.orig_func_name)
+ and id(getattr(value, self.orig_func_name)) == self.orig_func_id
+ ):
+ setattr(value, self.orig_func_name, final_patch_func)
+ self.is_applied = True
+
+ @staticmethod
+ def parse_path(module_path, function_name, create_dummy):
+ from importlib.machinery import ModuleSpec
+
+ modules = module_path.split('.')
+ for i in range(1, len(modules) + 1):
+ parent = '.'.join(modules[: i - 1])
+ path = '.'.join(modules[:i])
+ try:
+ importlib.import_module(path)
+ except ModuleNotFoundError as e:
+ if not parent or not hasattr(importlib.import_module(parent), modules[i - 1]):
+ if not create_dummy:
+ raise ModuleNotFoundError(e) from e
+ sys.modules[path] = types.ModuleType(path)
+ sys.modules[path].__file__ = 'mindspeed.dummy_module.py'
+ sys.modules[path].__spec__ = ModuleSpec(path, None)
+ if parent:
+ setattr(importlib.import_module(parent), modules[i - 1], sys.modules[path])
+ else:
+ module = getattr(importlib.import_module(parent), modules[i - 1])
+ if hasattr(module, function_name):
+ return module, getattr(module, function_name)
+ elif create_dummy:
+ return module, dummy_function_wrapper(function_name)
+ else:
+ raise RuntimeError('no exist {} of {}'.format(function_name, module))
+
+ if function_name is not None and not hasattr(sys.modules[module_path], function_name):
+ setattr(sys.modules[module_path], function_name, None)
+ return sys.modules[module_path], (
+ getattr(sys.modules[module_path], function_name) if function_name is not None else None
+ )
+
+
+class MindSpeedPatchesManager:
+ patches_info: Dict[str, Patch] = {}
+
+ @staticmethod
+ def register_patch(orig_func_name, new_func=None, force_patch=False, create_dummy=False):
+ """Patch registration method. When this method is executed, the patch does not take effect in real time.
+ It takes effect only after the apply_patches method is invoked. Other details are as follows:
+
+ 1. If `orig_func_name` does not exist and create_dummy is set to True, a dummy function is created to ensure
+ that the import is normal.
+ 2. If `orig_func_name` is not None, `orig_func_name` is replaced with `new_func`.
+ 3. If the `new_func` function name ends with `wrapper` or `decorator`, then `new_func` is decorated on
+ `orig_func_name` as a decorator, and the decorator can be superimposed repeatedly.
+ 4. When force_patch=False, a function cannot be replaced repeatedly (but can be decorated repeatedly),
+ otherwise the replacement is overwritten.
+ """
+ if orig_func_name not in MindSpeedPatchesManager.patches_info:
+ MindSpeedPatchesManager.patches_info[orig_func_name] = Patch(
+ orig_func_name, new_func, create_dummy
+ )
+ else:
+ MindSpeedPatchesManager.patches_info.get(orig_func_name).set_patch_func(
+ new_func, force_patch
+ )
+
+ @staticmethod
+ def remove_wrappers(orig_func_name, wrappers_name, remove_check=True):
+ """Remove wrapper registered in orig_func_name."""
+ if orig_func_name not in MindSpeedPatchesManager.patches_info:
+ raise ValueError('The function <{}> not exist.'.format(orig_func_name))
+
+ patch = MindSpeedPatchesManager.patches_info.get(orig_func_name)
+ wrappers_len = len(patch.wrappers)
+ patch.remove_wrappers(wrappers_name)
+ if remove_check and wrappers_len == len(patch.wrappers):
+ raise RuntimeError('Remove wrappers has not remove anything.')
+
+ @staticmethod
+ def apply_patches():
+ """Apply all patches registered in MindSpeedPatchesManager."""
+ for patch in MindSpeedPatchesManager.patches_info.values():
+ patch.apply_patch()
+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This file seems to be a duplicate of flagscale/train/megatron_basic_adaptor/patch_utils.py. Having duplicate files increases maintenance overhead. Please remove one of them and adjust the imports accordingly.

Comment on lines +168 to +175
+ # TODO: @aoyulong - This is a temporary solution to support single-file-per-tensor ckpt
+ non_homogeneous_layers_env = os.getenv('FS_NON_HOMOGENEOUS_LAYERS', 'False').lower() in (
+ 'true',
+ '1',
+ 't',
+ )
+ if non_homogeneous_layers_env:
+ non_homogeneous_layers = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using an environment variable (FS_NON_HOMOGENEOUS_LAYERS) to control model architecture at runtime is not ideal as it's an implicit dependency. This should be an explicit configuration option in TransformerConfig for better clarity and control.

Comment on lines +336 to +339
+ ## NOTE(lizhiyu): Force set system Prompt: "You are a helpful assistant."
+ # converted_conversation.append(
+ # {"role": "system", "content": "You are a helpful assistant."}
+ # )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commented-out code should be removed to improve code clarity.

Comment on lines +275 to +279
+ f"Failed to open image: {img_path}. Error: {e} of smaple[{sample.__key__}]"
+ )
+ # raise InternalWarning(
+ # f"Failed to open image: {img_path}. Error: {e} of smaple[{sample.__key__}]"
+ # )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo 'smaple' in the error message. Additionally, the commented-out InternalWarning should be removed to keep the code clean.

                        f"Failed to open image: {img_path}. Error: {e} of sample[{sample.__key__}]"
                    )

Comment on lines +11 to +18
+ data_path: /home/dataset/sample_dataset/sample_dataset/wds-1
+ vision_root: /home/dataset/sample_dataset/sample_dataset/
dataloader_type: external
split: 100,0,0
tokenizer:
tokenizer_type: Qwen2VLTokenizer
- tokenizer_path: xxxx
+ tokenizer_path: /home/dataset/Qwen2.5-VL-7B-Instruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding paths for data_path, vision_root, and tokenizer_path reduces the portability of this configuration. Consider using environment variables or placeholders to make these paths configurable.

Comment on lines +9 to +12
+ # NOTE(lizhiyu): The following code is for hetro-expert training when one of the expert parallel degree is 1.
+ # But there are other codes need to be modified to make it work.
+ # if config.enable_hetero:
+ # setattr(self.weight1, 'allreduce', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commented-out code seems to be for future work or debugging. It should be removed to keep the codebase clean.


data:
- data_path: ${data_path:??}
+ data_path: /home/dataset/pile_wikipedia_demo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the data_path makes the configuration less portable. It would be better to use an environment variable or a placeholder, similar to the original implementation (${data_path:??}), to allow for easier configuration across different environments.

  data_path: ${DATA_PATH:/home/dataset/pile_wikipedia_demo}

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


zhaoxuncheng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants