⚡️ Speed up function get_processor by 21%
#319
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 21% (0.21x) speedup for
get_processorinpython/sglang/srt/utils/hf_transformers_utils.py⏱️ Runtime :
2.45 seconds→2.03 seconds(best of5runs)📝 Explanation and details
The optimized code achieves a 20% speedup through several key micro-optimizations that reduce Python's runtime overhead:
Key Optimizations:
Argument Dictionary Consolidation: The original code passed the same arguments (
trust_remote_code,revision,**kwargs) to multiplefrom_pretrained()calls. The optimized version buildspretrained_argsonce and reuses it, eliminating redundant dictionary creation and argument unpacking operations.Reduced Branch Complexity: The nested condition for Qwen2-VL/Sarashina2Vision models was simplified from separate
ifchecks to a single combined condition (if config.model_type in {...} and "size" not in kwargs:), reducing branch prediction overhead.Exception Handling Optimization: When catching
ValueErrorand settinguse_fast=True, the optimized version modifies the pre-builtpretrained_argsdictionary instead of rebuilding arguments, avoiding duplicate keyword argument processing.Performance Impact:
The test results show consistent improvements across different scenarios:
Why This Works:
Python's dictionary operations and keyword argument unpacking (
**kwargs) have significant overhead. By pre-building the arguments dictionary once and reusing it, the optimization reduces:This optimization is particularly effective for functions called frequently in model loading pipelines, where even small per-call improvements compound significantly.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-get_processor-mhorpkmcand push.