-
Notifications
You must be signed in to change notification settings - Fork 121
patch for megatron core #889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @koutianzhengwen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the FlagScale framework by integrating and optimizing the Qwen2.5-VL multimodal model for training on Huawei Atlas800TA3 hardware, leveraging the Megatron-LM framework. It involves a major update to backend dependencies, fine-tuning of training configurations for specific models, and the introduction of a new, extensive data loading and processing framework called 'Energon'. These changes collectively aim to improve the efficiency and capability of multimodal model training on the target hardware. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a substantial set of patches to enable Megatron-Core support on Huawei Atlas hardware. The changes are extensive, adding numerous new files for hardware adaptation, custom operators, and monkey-patching core components. While this appears to be a significant step towards hardware compatibility, the current state of the code includes several areas that need improvement. I've noticed hardcoded paths in configuration files, commented-out code, leftover debugging statements, and build artifacts checked into the repository. More critically, there are instances of hardcoded device names (cuda, cpu) that undermine the goal of NPU support, duplicate files and function definitions, and the use of global variables for state management, which can lead to maintainability issues. Addressing these points will greatly improve the quality and robustness of this patch.
| diff --git a/kernel_meta/buildPidInfo.json b/kernel_meta/buildPidInfo.json | ||
| new file mode 100644 | ||
| index 00000000..1dd7a9ce | ||
| --- /dev/null | ||
| +++ b/kernel_meta/buildPidInfo.json | ||
| @@ -0,0 +1,258 @@ | ||
| +[ | ||
| + [ | ||
| + 3542810, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_1962833800725725517" | ||
| + ], | ||
| + [ | ||
| + 3542811, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_5135491729837132746" | ||
| + ], | ||
| + [ | ||
| + 3542812, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_201971367090549155" | ||
| + ], | ||
| + [ | ||
| + 3542813, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_2680602631233265554" | ||
| + ], | ||
| + [ | ||
| + 3542814, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_6022598180981049575" | ||
| + ], | ||
| + [ | ||
| + 3542815, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_13861986098675162229" | ||
| + ], | ||
| + [ | ||
| + 3542816, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_12585153093497637559" | ||
| + ], | ||
| + [ | ||
| + 3542817, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_12621055522972314439" | ||
| + ], | ||
| + [ | ||
| + 3542818, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_7901323918035740809" | ||
| + ], | ||
| + [ | ||
| + 3542819, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_2502011687513636034" | ||
| + ], | ||
| + [ | ||
| + 3542820, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_11127660572711503554" | ||
| + ], | ||
| + [ | ||
| + 3542821, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_16899104921307257912" | ||
| + ], | ||
| + [ | ||
| + 3542822, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_12168969600042534831" | ||
| + ], | ||
| + [ | ||
| + 3542823, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_5947623299434597152" | ||
| + ], | ||
| + [ | ||
| + 3542824, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_8669128749968193072" | ||
| + ], | ||
| + [ | ||
| + 3542825, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_10523056875872612535" | ||
| + ], | ||
| + [ | ||
| + 3555941, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_8521696946960939765" | ||
| + ], | ||
| + [ | ||
| + 3555942, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_16892975521147338452" | ||
| + ], | ||
| + [ | ||
| + 3555943, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_16944483673796156521" | ||
| + ], | ||
| + [ | ||
| + 3555944, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_2146335355416727975" | ||
| + ], | ||
| + [ | ||
| + 3555945, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_10692146212861378037" | ||
| + ], | ||
| + [ | ||
| + 3555946, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_6599005866133998198" | ||
| + ], | ||
| + [ | ||
| + 3555947, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_3524304607908504615" | ||
| + ], | ||
| + [ | ||
| + 3555948, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_1301410561755749912" | ||
| + ], | ||
| + [ | ||
| + 3555949, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_16611153486096822942" | ||
| + ], | ||
| + [ | ||
| + 3555950, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_16897800069514034067" | ||
| + ], | ||
| + [ | ||
| + 3555951, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_5585037901002416772" | ||
| + ], | ||
| + [ | ||
| + 3555952, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15103467739124890334" | ||
| + ], | ||
| + [ | ||
| + 3555953, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_6258222255360935812" | ||
| + ], | ||
| + [ | ||
| + 3555954, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_4842074883042507385" | ||
| + ], | ||
| + [ | ||
| + 3555955, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_18443539599822282991" | ||
| + ], | ||
| + [ | ||
| + 3555956, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_990735911844063815" | ||
| + ], | ||
| + [ | ||
| + 3569115, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_14128363198148798978" | ||
| + ], | ||
| + [ | ||
| + 3569116, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15248158307555883079" | ||
| + ], | ||
| + [ | ||
| + 3569117, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_13697321207211618499" | ||
| + ], | ||
| + [ | ||
| + 3569118, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_17667290731393060821" | ||
| + ], | ||
| + [ | ||
| + 3569119, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_3127781752883540278" | ||
| + ], | ||
| + [ | ||
| + 3569120, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15116490241101309573" | ||
| + ], | ||
| + [ | ||
| + 3569121, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_7563172329201413877" | ||
| + ], | ||
| + [ | ||
| + 3569122, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_4105781977409166076" | ||
| + ], | ||
| + [ | ||
| + 3569123, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15637440142566643625" | ||
| + ], | ||
| + [ | ||
| + 3569124, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_4316115201839818591" | ||
| + ], | ||
| + [ | ||
| + 3569125, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15623411163447736831" | ||
| + ], | ||
| + [ | ||
| + 3569126, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_12529375583352343312" | ||
| + ], | ||
| + [ | ||
| + 3569127, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_5457735498648893664" | ||
| + ], | ||
| + [ | ||
| + 3569128, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_1796251299214757029" | ||
| + ], | ||
| + [ | ||
| + 3569129, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_14086212340752724863" | ||
| + ], | ||
| + [ | ||
| + 3569130, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_7879416462992599489" | ||
| + ], | ||
| + [ | ||
| + 3582172, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_9692422471699391823" | ||
| + ], | ||
| + [ | ||
| + 3582173, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_6831436488153844847" | ||
| + ], | ||
| + [ | ||
| + 3582174, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_16380851789988553950" | ||
| + ], | ||
| + [ | ||
| + 3582175, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_10176872202077569091" | ||
| + ], | ||
| + [ | ||
| + 3582176, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_9244506976584338889" | ||
| + ], | ||
| + [ | ||
| + 3582177, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_2583028720151200939" | ||
| + ], | ||
| + [ | ||
| + 3582178, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_8204106850562333788" | ||
| + ], | ||
| + [ | ||
| + 3582179, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15430286334742802733" | ||
| + ], | ||
| + [ | ||
| + 3582180, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_17315954492644073688" | ||
| + ], | ||
| + [ | ||
| + 3582181, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_2142962576366369287" | ||
| + ], | ||
| + [ | ||
| + 3582182, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_1014042173593524031" | ||
| + ], | ||
| + [ | ||
| + 3582183, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_7944419661808782839" | ||
| + ], | ||
| + [ | ||
| + 3582184, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_17121113405440079511" | ||
| + ], | ||
| + [ | ||
| + 3582185, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_15163068314271812870" | ||
| + ], | ||
| + [ | ||
| + 3582186, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_3434065023340759447" | ||
| + ], | ||
| + [ | ||
| + 3582187, | ||
| + "/home/FlagScale/kernel_meta/kernel_meta_2770615037130619961" | ||
| + ] | ||
| +] | ||
| \ No newline at end of file | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| diff --git a/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak b/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak | ||
| new file mode 100644 | ||
| index 00000000..3866741b | ||
| --- /dev/null | ||
| +++ b/flagscale/train/megatron_basic_adaptor/npu_matmul_add.py.bak | ||
| @@ -0,0 +1,38 @@ | ||
| +import torch | ||
| +import torch_npu | ||
| +from ..op_builder.matmul_add_builder import MatmulAddOpBuilder | ||
| + | ||
| +__all__ = ["npu_matmul_add_fp32"] | ||
| + | ||
| + | ||
| +matmul_add_op_builder = MatmulAddOpBuilder() | ||
| + | ||
| +def npu_matmul_add_fp32(total_input, grad_output, grad): | ||
| + # 检查total_input的shape是否有维度为0 | ||
| + for dim in total_input.shape: | ||
| + if dim == 0: | ||
| + return | ||
| + | ||
| + # 检查grad_output的shape是否有维度为0 | ||
| + for dim in grad_output.shape: | ||
| + if dim == 0: | ||
| + return | ||
| + | ||
| + matmul_add_ops = matmul_add_op_builder.load() | ||
| + matmul_add_ops.npu_matmul_add_fp32(grad_output, total_input, grad) | ||
| + | ||
| + | ||
| +def npu_matmul_add_fp16(total_input, grad_output, grad): | ||
| + # 检查total_input的shape是否有维度为0 | ||
| + for dim in total_input.shape: | ||
| + if dim == 0: | ||
| + return | ||
| + | ||
| + # 检查grad_output的shape是否有维度为0 | ||
| + for dim in grad_output.shape: | ||
| + if dim == 0: | ||
| + return | ||
| + | ||
| + grad_weight = grad_output.t().matmul(total_input) | ||
| + grad.add_(grad_weight) | ||
| + | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| +def dummy_compile(*args, **kwargs): | ||
| + if len(args) > 0 and callable(args[0]): | ||
| + def wrapper(*fn_args, **fn_kwargs): | ||
| + return args[0](*fn_args, **fn_kwargs) | ||
| + return wrapper | ||
| + else: | ||
| + def compile_wrapper(fn): | ||
| + def wrapper(*fn_args, **fn_kwargs): | ||
| + return fn(*fn_args, **fn_kwargs) | ||
| + return wrapper | ||
| + return compile_wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| diff --git a/flagscale/train/patch_utils.py b/flagscale/train/patch_utils.py | ||
| new file mode 100644 | ||
| index 00000000..a0890867 | ||
| --- /dev/null | ||
| +++ b/flagscale/train/patch_utils.py | ||
| @@ -0,0 +1,174 @@ | ||
| +import importlib | ||
| +import sys | ||
| +import types | ||
| +from typing import List, Dict, Union | ||
| + | ||
| + | ||
| +def get_func_name(func): | ||
| + if isinstance(func, str): | ||
| + return func | ||
| + return '.'.join((func.__module__, func.__qualname__)) | ||
| + | ||
| + | ||
| +def dummy_function_wrapper(func_name): | ||
| + def dummy_function(*args, **kwargs): | ||
| + raise RuntimeError('function {} no exist'.format(func_name)) | ||
| + | ||
| + return dummy_function | ||
| + | ||
| + | ||
| +class Patch: | ||
| + def __init__(self, orig_func_name, new_func, create_dummy): | ||
| + split_name = orig_func_name.rsplit('.', 1) | ||
| + if len(split_name) == 1: | ||
| + self.orig_module_name, self.orig_func_name = orig_func_name, None | ||
| + else: | ||
| + self.orig_module_name, self.orig_func_name = split_name | ||
| + self.orig_module = None | ||
| + self.orig_func = None | ||
| + | ||
| + self.patch_func = None | ||
| + self.wrappers = [] | ||
| + if new_func is None: | ||
| + new_func = dummy_function_wrapper(orig_func_name) | ||
| + self.set_patch_func(new_func) | ||
| + self.is_applied = False | ||
| + self.create_dummy = create_dummy | ||
| + | ||
| + @property | ||
| + def orig_func_id(self): | ||
| + return id(self.orig_func) | ||
| + | ||
| + @property | ||
| + def patch_func_id(self): | ||
| + return id(self.patch_func) | ||
| + | ||
| + def set_patch_func(self, new_func, force_patch=False): | ||
| + if hasattr(new_func, '__name__') and new_func.__name__.endswith(('wrapper', 'decorator')): | ||
| + if new_func not in self.wrappers: | ||
| + self.wrappers.append(new_func) | ||
| + else: | ||
| + if self.patch_func and not force_patch: | ||
| + raise RuntimeError('the patch of {} exist !'.format(self.orig_func_name)) | ||
| + self.patch_func = new_func | ||
| + self.is_applied = False | ||
| + | ||
| + def remove_wrappers(self, wrapper_names: Union[str, List[str]] = None): | ||
| + if wrapper_names is None: | ||
| + self.wrappers.clear() | ||
| + return | ||
| + | ||
| + if isinstance(wrapper_names, str): | ||
| + wrapper_names = [wrapper_names] | ||
| + for name in wrapper_names: | ||
| + i = 0 | ||
| + while i < len(self.wrappers): | ||
| + if self.wrappers[i].__name__ == name: | ||
| + self.wrappers.pop(i) | ||
| + else: | ||
| + i += 1 | ||
| + | ||
| + def apply_patch(self): | ||
| + if self.is_applied: | ||
| + return | ||
| + | ||
| + self.orig_module, self.orig_func = Patch.parse_path( | ||
| + self.orig_module_name, self.orig_func_name, self.create_dummy | ||
| + ) | ||
| + | ||
| + final_patch_func = self.orig_func | ||
| + if self.patch_func is not None: | ||
| + final_patch_func = self.patch_func | ||
| + | ||
| + for wrapper in self.wrappers: | ||
| + final_patch_func = wrapper(final_patch_func) | ||
| + | ||
| + if self.orig_func_name is not None: | ||
| + setattr(self.orig_module, self.orig_func_name, final_patch_func) | ||
| + for key, value in sys.modules.copy().items(): | ||
| + if ( | ||
| + self.orig_func_name is not None | ||
| + and hasattr(value, self.orig_func_name) | ||
| + and id(getattr(value, self.orig_func_name)) == self.orig_func_id | ||
| + ): | ||
| + setattr(value, self.orig_func_name, final_patch_func) | ||
| + self.is_applied = True | ||
| + | ||
| + @staticmethod | ||
| + def parse_path(module_path, function_name, create_dummy): | ||
| + from importlib.machinery import ModuleSpec | ||
| + | ||
| + modules = module_path.split('.') | ||
| + for i in range(1, len(modules) + 1): | ||
| + parent = '.'.join(modules[: i - 1]) | ||
| + path = '.'.join(modules[:i]) | ||
| + try: | ||
| + importlib.import_module(path) | ||
| + except ModuleNotFoundError as e: | ||
| + if not parent or not hasattr(importlib.import_module(parent), modules[i - 1]): | ||
| + if not create_dummy: | ||
| + raise ModuleNotFoundError(e) from e | ||
| + sys.modules[path] = types.ModuleType(path) | ||
| + sys.modules[path].__file__ = 'mindspeed.dummy_module.py' | ||
| + sys.modules[path].__spec__ = ModuleSpec(path, None) | ||
| + if parent: | ||
| + setattr(importlib.import_module(parent), modules[i - 1], sys.modules[path]) | ||
| + else: | ||
| + module = getattr(importlib.import_module(parent), modules[i - 1]) | ||
| + if hasattr(module, function_name): | ||
| + return module, getattr(module, function_name) | ||
| + elif create_dummy: | ||
| + return module, dummy_function_wrapper(function_name) | ||
| + else: | ||
| + raise RuntimeError('no exist {} of {}'.format(function_name, module)) | ||
| + | ||
| + if function_name is not None and not hasattr(sys.modules[module_path], function_name): | ||
| + setattr(sys.modules[module_path], function_name, None) | ||
| + return sys.modules[module_path], ( | ||
| + getattr(sys.modules[module_path], function_name) if function_name is not None else None | ||
| + ) | ||
| + | ||
| + | ||
| +class MindSpeedPatchesManager: | ||
| + patches_info: Dict[str, Patch] = {} | ||
| + | ||
| + @staticmethod | ||
| + def register_patch(orig_func_name, new_func=None, force_patch=False, create_dummy=False): | ||
| + """Patch registration method. When this method is executed, the patch does not take effect in real time. | ||
| + It takes effect only after the apply_patches method is invoked. Other details are as follows: | ||
| + | ||
| + 1. If `orig_func_name` does not exist and create_dummy is set to True, a dummy function is created to ensure | ||
| + that the import is normal. | ||
| + 2. If `orig_func_name` is not None, `orig_func_name` is replaced with `new_func`. | ||
| + 3. If the `new_func` function name ends with `wrapper` or `decorator`, then `new_func` is decorated on | ||
| + `orig_func_name` as a decorator, and the decorator can be superimposed repeatedly. | ||
| + 4. When force_patch=False, a function cannot be replaced repeatedly (but can be decorated repeatedly), | ||
| + otherwise the replacement is overwritten. | ||
| + """ | ||
| + if orig_func_name not in MindSpeedPatchesManager.patches_info: | ||
| + MindSpeedPatchesManager.patches_info[orig_func_name] = Patch( | ||
| + orig_func_name, new_func, create_dummy | ||
| + ) | ||
| + else: | ||
| + MindSpeedPatchesManager.patches_info.get(orig_func_name).set_patch_func( | ||
| + new_func, force_patch | ||
| + ) | ||
| + | ||
| + @staticmethod | ||
| + def remove_wrappers(orig_func_name, wrappers_name, remove_check=True): | ||
| + """Remove wrapper registered in orig_func_name.""" | ||
| + if orig_func_name not in MindSpeedPatchesManager.patches_info: | ||
| + raise ValueError('The function <{}> not exist.'.format(orig_func_name)) | ||
| + | ||
| + patch = MindSpeedPatchesManager.patches_info.get(orig_func_name) | ||
| + wrappers_len = len(patch.wrappers) | ||
| + patch.remove_wrappers(wrappers_name) | ||
| + if remove_check and wrappers_len == len(patch.wrappers): | ||
| + raise RuntimeError('Remove wrappers has not remove anything.') | ||
| + | ||
| + @staticmethod | ||
| + def apply_patches(): | ||
| + """Apply all patches registered in MindSpeedPatchesManager.""" | ||
| + for patch in MindSpeedPatchesManager.patches_info.values(): | ||
| + patch.apply_patch() | ||
| + | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + # TODO: @aoyulong - This is a temporary solution to support single-file-per-tensor ckpt | ||
| + non_homogeneous_layers_env = os.getenv('FS_NON_HOMOGENEOUS_LAYERS', 'False').lower() in ( | ||
| + 'true', | ||
| + '1', | ||
| + 't', | ||
| + ) | ||
| + if non_homogeneous_layers_env: | ||
| + non_homogeneous_layers = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + ## NOTE(lizhiyu): Force set system Prompt: "You are a helpful assistant." | ||
| + # converted_conversation.append( | ||
| + # {"role": "system", "content": "You are a helpful assistant."} | ||
| + # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + f"Failed to open image: {img_path}. Error: {e} of smaple[{sample.__key__}]" | ||
| + ) | ||
| + # raise InternalWarning( | ||
| + # f"Failed to open image: {img_path}. Error: {e} of smaple[{sample.__key__}]" | ||
| + # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + data_path: /home/dataset/sample_dataset/sample_dataset/wds-1 | ||
| + vision_root: /home/dataset/sample_dataset/sample_dataset/ | ||
| dataloader_type: external | ||
| split: 100,0,0 | ||
| tokenizer: | ||
| tokenizer_type: Qwen2VLTokenizer | ||
| - tokenizer_path: xxxx | ||
| + tokenizer_path: /home/dataset/Qwen2.5-VL-7B-Instruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + # NOTE(lizhiyu): The following code is for hetro-expert training when one of the expert parallel degree is 1. | ||
| + # But there are other codes need to be modified to make it work. | ||
| + # if config.enable_hetero: | ||
| + # setattr(self.weight1, 'allreduce', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| data: | ||
| - data_path: ${data_path:??} | ||
| + data_path: /home/dataset/pile_wikipedia_demo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding the data_path makes the configuration less portable. It would be better to use an environment variable or a placeholder, similar to the original implementation (${data_path:??}), to allow for easier configuration across different environments.
data_path: ${DATA_PATH:/home/dataset/pile_wikipedia_demo}
|
zhaoxuncheng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
patch for megatron core