- Node.js (v18+)
- npm (v9+)
-
Clone the repository:
git clone https://github.com/jitsmaster/web-crawler-mcp.git cd web-crawler-mcp
-
Install dependencies:
npm install
-
Build the project:
npm run build
Create a .env
file with the following environment variables:
CRAWL_LINKS=false
MAX_DEPTH=3
REQUEST_DELAY=1000
TIMEOUT=5000
MAX_CONCURRENT=5
Start the MCP server:
npm start
Add the following to your MCP settings file:
{
"mcpServers": {
"web-crawler": {
"command": "node",
"args": ["/path/to/web-crawler/build/index.js"],
"env": {
"CRAWL_LINKS": "false",
"MAX_DEPTH": "3",
"REQUEST_DELAY": "1000",
"TIMEOUT": "5000",
"MAX_CONCURRENT": "5"
}
}
}
}
The server provides a crawl
tool that can be accessed through MCP. Example usage:
{
"url": "https://example.com",
"depth": 1
}
Environment Variable | Default | Description |
---|---|---|
CRAWL_LINKS | false | Whether to follow links |
MAX_DEPTH | 3 | Maximum crawl depth |
REQUEST_DELAY | 1000 | Delay between requests (ms) |
TIMEOUT | 5000 | Request timeout (ms) |
MAX_CONCURRENT | 5 | Maximum concurrent requests |