-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Optimize configuration access with LRU cache in custom ops #22204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize configuration access with LRU cache in custom ops #22204
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a cached function get_cached_compilation_config using lru_cache to optimize repeated access to the compilation configuration. While the intent to improve performance is good, the current implementation has a critical flaw. The caching mechanism doesn't account for changes in the global configuration context, which can lead to stale cache data and incorrect behavior, especially in testing environments. I've provided a detailed comment on how to address this to ensure correctness.
vllm/model_executor/custom_op.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using lru_cache on this function is problematic because it depends on a mutable global variable (_current_vllm_config in vllm.config). The set_current_vllm_config context manager can change this global, but the cache won't be invalidated, leading to stale data. This will break tests that iterate over different configurations, like tests/model_executor/test_enabled_custom_ops.py.
A robust solution is to clear the cache when the context changes. This can be done by moving get_cached_compilation_config to vllm/config.py and calling get_cached_compilation_config.cache_clear() in the finally block of the set_current_vllm_config context manager.
This change would require modifying vllm/config.py and importing this function from there.
Example change in vllm/config.py:
@contextmanager
def set_current_vllm_config(vllm_config: VllmConfig, ...):
...
try:
...
yield
finally:
get_cached_compilation_config.cache_clear()
...fa5997e to
d680d2b
Compare
Add cached compilation config function to reduce repeated calls to get_current_vllm_config().compilation_config in CustomOp class. This improves performance by avoiding redundant configuration lookups during model execution. Changes: - Add get_cached_compilation_config() with @lru_cache decorator - Replace direct config calls in dispatch_forward(), enabled(), and default_on() methods - Import functools.lru_cache for caching functionality Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Shortened docstring from 87 to 73 characters to comply with line length limit. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Move get_cached_compilation_config from custom_op.py to config.py and add cache_clear() call in set_current_vllm_config finally block to ensure cache is invalidated when configuration context changes. This prevents stale cached data in tests that iterate over different configurations, particularly in test_enabled_custom_ops.py. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
87a919a to
ca92f9c
Compare
| get_cached_compilation_config.cache_clear() | ||
|
|
||
|
|
||
| @lru_cache(maxsize=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you could use cache instead if you aren't going to use the lru part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Changes
@lru_cache(maxsize=1) to cache compilation config access
manager to ensure cache freshness
Benefits
Files Modified