Optimize configuration access with LRU cache in custom ops #22204

skyloevil · 2025-08-04T16:36:43Z

Changes

Added get_cached_compilation_config() function with
@lru_cache(maxsize=1) to cache compilation config access
Added cache invalidation in set_current_vllm_config() context
manager to ensure cache freshness
Fixed code style (line length) issues

Benefits

Reduces repeated configuration lookups in custom operations
Improves performance for frequently accessed compilation config
Maintains correctness with proper cache invalidation

Files Modified

vllm/config.py: Added caching function and invalidation logic
vllm/model_executor/custom_op.py: Updated to use cached config access

github-actions · 2025-08-04T16:36:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a cached function get_cached_compilation_config using lru_cache to optimize repeated access to the compilation configuration. While the intent to improve performance is good, the current implementation has a critical flaw. The caching mechanism doesn't account for changes in the global configuration context, which can lead to stale cache data and incorrect behavior, especially in testing environments. I've provided a detailed comment on how to address this to ensure correctness.

gemini-code-assist · 2025-08-04T16:37:38Z

vllm/model_executor/custom_op.py

Using lru_cache on this function is problematic because it depends on a mutable global variable (_current_vllm_config in vllm.config). The set_current_vllm_config context manager can change this global, but the cache won't be invalidated, leading to stale data. This will break tests that iterate over different configurations, like tests/model_executor/test_enabled_custom_ops.py.

A robust solution is to clear the cache when the context changes. This can be done by moving get_cached_compilation_config to vllm/config.py and calling get_cached_compilation_config.cache_clear() in the finally block of the set_current_vllm_config context manager.

This change would require modifying vllm/config.py and importing this function from there.

Example change in vllm/config.py:

@contextmanager def set_current_vllm_config(vllm_config: VllmConfig, ...): ... try: ... yield finally: get_cached_compilation_config.cache_clear() ...

Add cached compilation config function to reduce repeated calls to get_current_vllm_config().compilation_config in CustomOp class. This improves performance by avoiding redundant configuration lookups during model execution. Changes: - Add get_cached_compilation_config() with @lru_cache decorator - Replace direct config calls in dispatch_forward(), enabled(), and default_on() methods - Import functools.lru_cache for caching functionality Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

Shortened docstring from 87 to 73 characters to comply with line length limit. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

Move get_cached_compilation_config from custom_op.py to config.py and add cache_clear() call in set_current_vllm_config finally block to ensure cache is invalidated when configuration context changes. This prevents stale cached data in tests that iterate over different configurations, particularly in test_enabled_custom_ops.py. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

mgoin · 2025-08-04T17:06:47Z

vllm/config.py

+        get_cached_compilation_config.cache_clear()
+
+
+@lru_cache(maxsize=1)


Nit: you could use cache instead if you aren't going to use the lru part

mgoin

LGTM

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Noam Gat <noamgat@gmail.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

gemini-code-assist bot reviewed Aug 4, 2025

View reviewed changes

skyloevil force-pushed the config-cache-optimization branch 2 times, most recently from fa5997e to d680d2b Compare August 4, 2025 16:51

skyloevil requested review from WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth and youkaichao as code owners August 4, 2025 17:03

skyloevil added 3 commits August 5, 2025 01:03

Fix E501 line too long error in custom_op.py

5ac280b

Shortened docstring from 87 to 73 characters to comply with line length limit. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

skyloevil force-pushed the config-cache-optimization branch from 87a919a to ca92f9c Compare August 4, 2025 17:03

mgoin reviewed Aug 4, 2025

View reviewed changes

mgoin approved these changes Aug 4, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 4, 2025

vllm-bot merged commit 4b3e447 into vllm-project:main Aug 5, 2025
49 of 51 checks passed

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

c0483b2

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

myselvess pushed a commit to myselvess/vllm that referenced this pull request Aug 7, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

56f61a2

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

fdf62dc

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Noam Gat <noamgat@gmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

43b8e11

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

1498d67

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

3d3bd70

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Optimize configuration access with LRU cache in custom ops (vllm-proj…

3ccd9d1

…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize configuration access with LRU cache in custom ops #22204

Optimize configuration access with LRU cache in custom ops #22204

Uh oh!

skyloevil commented Aug 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 4, 2025

Uh oh!

mgoin Aug 4, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		get_cached_compilation_config.cache_clear()


		@lru_cache(maxsize=1)

Uh oh!

Optimize configuration access with LRU cache in custom ops #22204

Optimize configuration access with LRU cache in custom ops #22204

Uh oh!

Conversation

skyloevil commented Aug 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

skyloevil commented Aug 4, 2025 •

edited by github-actions bot

Loading