-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[gpt-oss]Support lazy init mcp session #24388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the tool session initialization to be lazy by making ConversationContext an async context manager. This is a good change that centralizes resource management. The implementation uses AsyncExitStack within HarmonyContext to correctly manage the lifecycle of tool sessions. However, I've identified a critical issue in call_tool where a coroutine is passed as an argument instead of its awaited result, which would lead to a runtime error. My review includes a fix for this issue.
|
This pull request has merge conflicts that must be resolved before it can be |
|
Also CC @yeqcharlotte |
|
thanks for the change and the test plan @wuhang2014. could you rebase it after #23386? could you turn more test plan into unit tests? also maybe double check if the e2e aime25 with tool eval score still makes sense. cc: @alecsolder @lacora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase and add the unit test coverage
vllm/entrypoints/context.py
Outdated
| self._reference_count += 1 | ||
| if self._async_exit_stack is None: | ||
| assert (self._reference_count == 1 | ||
| ), "Reference count of exit stack should be " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you make the assertion message more meaningful?
|
I like a lot of what is going on here, and the implementation looks a lot cleaner than before, but a couple comments: First, I think we should absolutely only init connections for tools that are enabled in the request, not for all built in tools registered to the tool server. This PR definitely solves that. However I think this lazy init should be enabled as an option, and not by default. I think there are a few different reasons why we could want the option for the connections to happen before any generation steps have taken place:
TLDR and my actual suggestions:
|
|
@alecsolder Thanks! What about adding a config to control whether to enable lazy initialization in this PR and enable lazy initialization by default? We can discuss the better default choice without blocking this PR.
I don't think we need it. We can first initialize the tool server, and if lazy initialization is disabled, create connection with MCP tool server with the same API of lazy initialization.
I agree
I agree Discussion about which option is better:
With lazy initialization, we can still serve simple requests that don't need tool call. To my understanding, simple requests will still enable all tools because there isn't a layer to detect it is a simple request and disable the tools. And if you do want fast failing, maybe we can check the healthy of MCP server every several minutes, without affecting the TTFT of all requests.
I think we should return the
The result of list_tools need to be cached as it doesn't change very frequently. We don't need to check it for every request.
|
Sounds good!
Works for me
I think this is a good strategy, sounds more than good enough for now.
Yeah agreed, I think that part would just need to be implemented as part of this PR
Yep understood, I've seen it to be a bit annoying for development purposes because now if I want to iterate on my MCP server prompts, I have to restart vLLM every single time. But this is definitely more of an issue for the "remote MCP" feature for sure compared to the build in tools that are currently supported. OpenAI gives some information on how the list_tools request is cached here. It is actually emitted as an event, and is cached on a request by request basis, which makes sense for an enterprise but is definitely a bit much here. Just mentioning it so we can consider it for the future!
There is good info on auth headers here, basically if you define headers on the tool object, they are passed through to the MCP server. It is like if I was using stripe, I'd want to define my stripe_api_key as part of my responses API request. That is how stripe hosts one MCP service everyone can use.
I think this would be ideal! |
|
@wuhang2014
|
|
@alecsolder can you help to create issues to track the TODOs that are important for you? |
I will rebase it and add unit tests in a few days |
Yes, I can do this in a few days. |
3d31bf0 to
2fe095c
Compare
a795883 to
173f243
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
173f243 to
b610210
Compare
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
Currently, we initialize all MCP sessions that specified by user request before generation process no matter whether tools are actually being used.
This PR suggests implementing lazy initialization for MCP sessions. The major change is to initialize sessions with tool server only before the tool is called.
Test Plan
With a python MCP tool server, test 2 requests with tool of
code_interpretertype with background mode using responses API. One request do not use tool while another request use python tool.We expect that the first request will not create any MCP session with tool server, while the second request will create session with tool server a period time after create background task for request handling.
Test Result
mcp server log
client log
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.