Skip to content

AIS2Lab/MCPSecBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCPSecBench

This benchmark includes MCPSecBench and data used in our experiment.

A technical report is available as follows:

@article{yang2025mcpsecbench,
  title={MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols},
  author={Yang, Yixuan and Wu, Daoyuan and Chen, Yufan},
  journal={arXiv preprint arXiv:2508.13220},
  year={2025}
}

Overview of MCPSecBench

  • main.py: an automated testing script including part attacks.
  • addserver.py: normal server for computation.
  • maliciousadd.py: malicious server.
  • download.py: a normal server for checking signature.
  • squatting.py: a malicious server for server name squatting.
  • client.py: client that connect with MCP host and server. At present, it support OpenAI and Claude. It can be extended for Deepseek, Llama, and QWen.
  • mitm.py: the script that implements Man-in-the-Middle attack.
  • index.js: the script for DNS rebinding attack.
  • cve-2025-6541.py: a malicious server to trigger CVE-2025-6541.
  • claude_desktop_config.json: the configuration for Claude Desktop.
  • prompts: example prompts for testing.
  • results: only for openai at present.

Set up MCPSecBench

needs: python version higher than 3.10

  • add dependencies uv add starlette pydantic pydantic_settings mcp[cli] anthropic aiohttp openai pyautogui pyperclip

    you may need to use apt install some extra dependencies to activate pyautogui

  • change the basepath in malicious_add.py to you real path

  • for tool name squatting and server name squatting in Claude. Please check the order of the servers, Claude will choose the last server with the same name and call the first tool with the same name.

How to use MCPSecBench

Test Script

The auto check supports OpenAI and Cursor at present. To implement in Claude Desktop, please change the parameter of wait_for_image in main.py such as img/cursor_init.png to the screenshot of Claude Desktop.

  • set API_Key. export OPENAI_API_KEY xxxx / export ANTHROPIC_API_KEY xxx

  • uv run main.py mode(0 for Claude in CLI mode, 1 for OpenAI, 2 for Cursor) e.g. uv run main.py 1

Delete /tmp/state.json at first.

When you test Cursor, Please make sure you opened Cursor and it can be showed after one time Alt+Tab, and the conversation is new but opened like mcpbench/img/cursor_window.png

Testing LLM models and MCP servers with own MCP client

  • First launch all remote servers. For example: uv run download.py
  • set API_Key. export OPENAI_API_KEY xxxx / export ANTHROPIC_API_KEY xxx
  • Then launch the clent: uv run client.py mode(0, 1). 0 for claude, 1 for openai.
  • In the end, interactive with LLM model

Testing Claude-Desktop

  • First copy the content of claude_desktop_config.json to your claude_desktop_config.json, change the directory to your path.
  • Launch all remote servers. For example: uv run download.py
  • Test by Claude-Desktop

Testing Cursor

  • Copy the content of cursor_config.json to Cursor configuration, change the directory to your path.
  • Launch all remote servers. For example: uv run download.py
  • Test by Cursor manually or via main.py

Experiment Results

Experiments Results are shown in data folder.

License

Released under the MIT License.

About

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •