-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Some kind of risk level returned by servers #114
Comments
Thanks for filing this. Your proposal makes a lot of sense. I wonder if perhaps we support an open-ended set of risk tags, on a normalized scale [0, 1] (similar to model preferences in sampling), with some predefined "well-known" tags codified in the spec. |
FTR we had a similar discussion here: https://github.com/orgs/modelcontextprotocol/discussions/69, I will compare it to the above and add any specific feedback |
Obviously this is a repo about the spec alone, that said here is how I see the bigger picture and what would be useful to consider. Simple would be best, if these changes are overly complicated then users won't bother. Client config for promptsUsers should be able to configure each tool to: For example, a fetch server that simply downloads web pages, I would want that to never prompt me. Tool riskA tool can instruct users on risk/approval. This could apply as the default if the user doesn't configure an override. So, same choices: Models can assess risk per tool useWhen a user specifies to For example, in my mcp-server-commands, Claude knows well enough to mark the Best part is, users can customize instructions in a system prompt to specify how they expect the model to interpret risk. For example, by default across all of my projects:
Whereas, a project specific prompt might be:
Model trustThis would be the last feature I might consider adding if the above is insufficient. For the most part I trust the models I am using and wouldn't use an untrusted model to do anything risky. That said, people could use a separate model/server solely to score the risk of each tool use request. So, some sort of mechanism to round trip a score ( |
I totally agree with the problem statement here, namely:
One concern I have with the proposed solutions' direction is whether or not servers can have the right context to define risk. I think it's hard for a server to properly define the "risk" associated with a tool call in a general way that serves any client. For example, if I build a DuckDuckGo-like "private chat client" that promises to be privacy-forward, then something as simple as a It's certainly possible to work around this by well-defining categories and a taxonomy for defining risk that everyone agrees on like @domdomegg started to get into, but I worry it's a lot of complexity to introduce into the spec. An alternative: what if we leave this up to client applications to manage? The spec is currently pretty strongly worded on this topic:
A example tweak that might resolve this issue, as far as the spec is concerned:
If we designate risk to be a concern of the client, I think the client already has everything it needs to manage the risk:
For a client like Claude Desktop, I could imagine the implementation looking like:
If it so chose, Claude Desktop could smooth over the "JSON config" part of the UX by just running inference on app startup, "based on what I know about my user's risk tolerance, let me configure the current never approve/always approve/etc JSON settings for every visible tool accordingly" and then re-run that step whenever it receives a listChanged notification. Maybe down the line when there are more examples of popular clients, it'll be worth better-defining best practices for client implementations in the spec. |
Yes, the server doesn't get to have a say in defining the risk level. The point of the confirmation is just as much about telling the user that you're sending their private data from a conversation to a remote tool, than the actual riskiness of the action a tool might perform. I agree with the framing that this a client decision, and the client just need enough info in the spec to do this effectively, and then the rest is a client implementation detail. |
Is your feature request related to a problem? Please describe.
Most MCP client applications (such as the Claude Desktop app), ask users to approve many minor actions. Example dialog:
This can be frustrating for users if there are many tools they're trying to use. Having to do this many times likely will result in users defaulting to allowing decisions (alarm fatigue). Some other MCP client apps might choose to not ask users for permission at all, which seems dangerous.
Ideally we want some way to:
Describe the solution you'd like
Currently, the protocol does not provide a way for servers to indicate how 'risky' an action is (apart from maybe in a non-structured way in the description). There's also no straightforward way for the server to provide context about how risky a particular action would be.
One idea might be to add properties to the Tool data type, that adds something like:
risk_level
oflow
,moderate
orhigh
financial_risk: £25.99
.Clients could then have more flexibility for how they want to warn users of actions. E.g.
(In the future, AI systems might themselves be able to make these judgements based on a risk profile set by the user - e.g. evaluating the request against a user's risk appetite statement. Returning the risk information would then help this system evaluate more complex tests, such as 'Autoapprove edits to database table X, but only allow read access to table Y' OR 'Autoapprove creating email drafts, but ask me before sending them.')
Describe alternatives you've considered
I'm open to other ways to achieve solving the problem (improving safety of MCPs by avoiding alarm fatigue).
The text was updated successfully, but these errors were encountered: