-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Bugfix]: fix metadata file copy in test_sharded_state_loader #21830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix]: fix metadata file copy in test_sharded_state_loader #21830
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug where filename patterns were being checked with endswith
instead of fnmatch
. The change also correctly handles subdirectories by copying them. However, the new logic doesn't apply the filter patterns to directories, which could lead to incorrect behavior. I've provided a suggestion to apply patterns to both files and directories to make the implementation more robust and consistent with the behavior of similar utilities in huggingface_hub
.
tests/test_sharded_state_loader.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic correctly uses fnmatch
for files, but it unconditionally copies all directories without checking them against weights_patterns
. This could lead to incorrect behavior if a weight pattern is intended to match a directory, which deviates from the behavior of ignore_patterns
in huggingface_hub
that was referenced in the pull request description.
For example, if weights_patterns
contained a pattern like "checkpoints"
, a directory with that name would be copied instead of being ignored.
To make the logic more robust and align with the intended behavior, I suggest checking items against weights_patterns
first, regardless of whether they are files or directories. This ensures that directories matching a weight pattern are correctly skipped.
if os.path.isdir(os.path.join(input_dir, file)): | |
continue | |
if not any(file.endswith(ext) for ext in weights_patterns): | |
shutil.copy(f"{input_dir}/{file}", output_dir) | |
shutil.copytree( | |
os.path.join(input_dir, file), os.path.join(output_dir, file) | |
) | |
elif not any(fnmatch.fnmatch(file, ext) for ext in weights_patterns): | |
shutil.copy(os.path.join(input_dir, file), output_dir) | |
if any(fnmatch.fnmatch(file, ext) for ext in weights_patterns): | |
continue | |
src_path = os.path.join(input_dir, file) | |
if os.path.isdir(src_path): | |
shutil.copytree(src_path, os.path.join(output_dir, file)) | |
else: | |
shutil.copy(src_path, output_dir) |
f938118
to
8f9c472
Compare
Signed-off-by: Andy Xie <andy.xning@gmail.com>
8f9c472
to
51bf142
Compare
continue | ||
if not any(file.endswith(ext) for ext in weights_patterns): | ||
shutil.copy(f"{input_dir}/{file}", output_dir) | ||
shutil.copytree(os.path.join(input_dir, file), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to copy directories now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with
vllm/examples/offline_inference/save_sharded_state.py
Lines 87 to 95 in 4cd7fe6
# Copy metadata files to output directory | |
for file in os.listdir(model_path): | |
if os.path.splitext(file)[1] not in (".bin", ".pt", ".safetensors"): | |
if os.path.isdir(os.path.join(model_path, file)): | |
shutil.copytree( | |
os.path.join(model_path, file), os.path.join(args.output, file) | |
) | |
else: | |
shutil.copy(os.path.join(model_path, file), args.output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks for fixing this. cc: @DarkLight1337 @WoosukKwon
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: x22x22 <wadeking@qq.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com>
…roject#21830) Signed-off-by: Andy Xie <andy.xning@gmail.com>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
currently,
weights_patterns
is["*.safetensors"]
and iffile
ismodel.safetensors
, then ext is*.safetensors
andfile.endswith(ext)
will be false. As*.safetensors
is a pattern, then we should usefnmatch
to check whether a file matches the pattern like howignore_patterns
is implemented in huggingface_cli.Test Plan
NA
Test Result
NA
(Optional) Documentation Update
NA