Skip to content

Conversation

@noooop
Copy link
Collaborator

@noooop noooop commented Oct 23, 2025

Purpose

  • Add num_cached_tokens for PoolingRequestOutput
  • fix ci failure in main

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop force-pushed the pooling_num_cached_tokens branch from 0ee0fd4 to 11f60fc Compare October 23, 2025 01:27
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop marked this pull request as ready for review October 23, 2025 03:21
@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2025
@noooop
Copy link
Collaborator Author

noooop commented Oct 23, 2025

Start CI test to check what CI failures in main that still need to be fixed.

vllm_outputs = vllm_model.classify(example_prompts)

# First Run
vllm_model.classify(example_prompts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that initially the number of cached tokens is zero?

PoolingRequestOutput[Any](
request_id="",
outputs=processed_outputs,
num_cached_tokens=getattr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need getattr here? In what case is that not available?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result of io_processor might not have this value

Copy link
Collaborator Author

@noooop noooop Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please unblock Language Models Test (Extended Pooling) and Language Models Test (MTEB) to check for CI failures in the main branch that still need to be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think we should make this a property of PoolingRequestOutput itself?

Copy link
Member

@DarkLight1337 DarkLight1337 Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

@property
def num_cached_tokens(self) -> int:
    return getattr(self.processed_outputs, "num_cached_tokens", 0)

Copy link
Collaborator Author

@noooop noooop Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we first merge this and discuss the issue in #26973? This PR is actually intended to fix CI failures in the main branch for #27329.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@DarkLight1337 DarkLight1337 merged commit 3729ed0 into vllm-project:main Oct 23, 2025
50 of 51 checks passed
@noooop noooop deleted the pooling_num_cached_tokens branch October 23, 2025 07:37
usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
…27378)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
845473182 pushed a commit to raindaywhu/vllm that referenced this pull request Oct 24, 2025
…o step_forward

* 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits)
  [Model] Add MoE support for NemotronH (vllm-project#25863)
  [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245)
  [CI] Reorganize entrypoints tests (vllm-project#27403)
  add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525)
  [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388)
  [Bugfix] Fix args settings for guided decoding args (vllm-project#27375)
  [CI/Build] Fix Prithvi plugin test (vllm-project#27393)
  [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372)
  [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378)
  [V1][spec decode] return logprobs for spec decoding (vllm-project#26060)
  [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219)
  [Bugfix][Core] running queue index leakage exception (vllm-project#26754)
  [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133)
  [Bugfix] Fix SLA tuner initialization (vllm-project#27355)
  [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361)
  [MLA] Bump FlashMLA (vllm-project#27354)
  [Chore] Separate out system utilities from vllm.utils (vllm-project#27201)
  [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128)
  [Feature] publisher default set zmq in kv_event config (vllm-project#26915)
  [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211)
  ...
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…27378)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…27378)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants