Skip to content

Conversation

@skyloevil
Copy link
Contributor

@skyloevil skyloevil commented Aug 4, 2025

Changes

  • Added get_cached_compilation_config() function with
    @lru_cache(maxsize=1) to cache compilation config access
  • Added cache invalidation in set_current_vllm_config() context
    manager to ensure cache freshness
  • Fixed code style (line length) issues

Benefits

  • Reduces repeated configuration lookups in custom operations
  • Improves performance for frequently accessed compilation config
  • Maintains correctness with proper cache invalidation

Files Modified

  • vllm/config.py: Added caching function and invalidation logic
  • vllm/model_executor/custom_op.py: Updated to use cached config access

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cached function get_cached_compilation_config using lru_cache to optimize repeated access to the compilation configuration. While the intent to improve performance is good, the current implementation has a critical flaw. The caching mechanism doesn't account for changes in the global configuration context, which can lead to stale cache data and incorrect behavior, especially in testing environments. I've provided a detailed comment on how to address this to ensure correctness.

Comment on lines 16 to 19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using lru_cache on this function is problematic because it depends on a mutable global variable (_current_vllm_config in vllm.config). The set_current_vllm_config context manager can change this global, but the cache won't be invalidated, leading to stale data. This will break tests that iterate over different configurations, like tests/model_executor/test_enabled_custom_ops.py.

A robust solution is to clear the cache when the context changes. This can be done by moving get_cached_compilation_config to vllm/config.py and calling get_cached_compilation_config.cache_clear() in the finally block of the set_current_vllm_config context manager.

This change would require modifying vllm/config.py and importing this function from there.

Example change in vllm/config.py:

@contextmanager
def set_current_vllm_config(vllm_config: VllmConfig, ...):
    ...
    try:
        ...
        yield
    finally:
        get_cached_compilation_config.cache_clear()
        ...

Add cached compilation config function to reduce repeated calls to
get_current_vllm_config().compilation_config in CustomOp class.
This improves performance by avoiding redundant configuration lookups
during model execution.

Changes:
- Add get_cached_compilation_config() with @lru_cache decorator
- Replace direct config calls in dispatch_forward(), enabled(), and default_on() methods
- Import functools.lru_cache for caching functionality

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Shortened docstring from 87 to 73 characters to comply with line length limit.

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Move get_cached_compilation_config from custom_op.py to config.py and
add cache_clear() call in set_current_vllm_config finally block to
ensure cache is invalidated when configuration context changes.

This prevents stale cached data in tests that iterate over different
configurations, particularly in test_enabled_custom_ops.py.

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
@skyloevil skyloevil force-pushed the config-cache-optimization branch from 87a919a to ca92f9c Compare August 4, 2025 17:03
get_cached_compilation_config.cache_clear()


@lru_cache(maxsize=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you could use cache instead if you aren't going to use the lru part

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 4, 2025
@vllm-bot vllm-bot merged commit 4b3e447 into vllm-project:main Aug 5, 2025
49 of 51 checks passed
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
myselvess pushed a commit to myselvess/vllm that referenced this pull request Aug 7, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: Noam Gat <noamgat@gmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
…ect#22204)

Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants