A Model Context Protocol (MCP) server that opens websites and captures all HTTP requests made by those sites. This tool is perfect for analyzing web traffic patterns, tracking third-party integrations, and understanding the network behavior of websites.
- Complete Request Capture: Captures all HTTP requests made by a website including XHR, fetch, and resource requests
- Detailed Analysis: Provides comprehensive data including:
- Request URLs, methods, headers
- Response status codes and headers
- Resource types (document, script, stylesheet, image, etc.)
- Timestamps for timing analysis
- Domain analysis showing all external services
- Configurable Options:
- Adjustable wait times for dynamic content
- Option to include/exclude image and media requests
- Custom viewport and user agent settings
- Built with Patchright: Uses Patchright (enhanced Playwright) for reliable browser automation
- Bun Runtime
- Bun runtime
- Google chrome installed
- git
-
Clone and Install Dependencies
git clone <repository-url> cd mcp_scraper_analytics bun install
-
Test the Server
# Run the server bun src/index.ts
The MCP exposes the following tools. Each entry includes the parameter schema, a concrete input example, and an example output the tool will return.
- analyze_website_requests
Description: Open a website in a real browser, capture all HTTP requests, and store a site analysis. The ListTools schema documents default client-side hints: waitTime default 3000 ms, includeImages default false, quickMode default false. See the implementation in
src/server.ts
.
Parameters:
- url (string, required): The website URL to analyze (must include http:// or https://)
- waitTime (number, optional): Additional wait time in milliseconds for dynamic content (default: 3000, max: 10000)
- includeImages (boolean, optional): Whether to include image and media requests (default: false)
- quickMode (boolean, optional): Use quick loading mode with minimal waiting (default: false)
Example input:
{
"url": "https://example.com",
"waitTime": 3000,
"includeImages": false,
"quickMode": false
}
Example output (stored analysis summary):
{
"websiteInfo": {
"url": "https://example.com",
"title": "Example Domain",
"analysisTimestamp": "2025-08-15T18:00:00.000Z",
"renderMethod": "client"
},
"requestSummary": {
"totalRequests": 12,
"uniqueDomains": 4,
"requestsByType": {
"document": 1,
"script": 6,
"stylesheet": 2,
"xhr": 2,
"font": 1
}
},
"domains": [
"example.com",
"cdn.example.com",
"analytics.google.com"
],
"antiBotDetection": {
"detected": false
},
"browserStorage": {
"cookies": [
{
"name": "sessionid",
"value": "abc123",
"domain": "example.com",
"path": "/",
"expires": 1692136800,
"httpOnly": true,
"secure": true,
"sameSite": "Lax"
}
],
"localStorage": {
"theme": "dark",
"pref": "1"
},
"sessionStorage": {
"lastVisited": "/home"
}
}
}
- get_requests_by_domain
Description: Retrieve all captured requests for a specific domain from a previously stored analysis. This reads the stored analysis results; run
analyze_website_requests
first. See handler schema insrc/handlers/schemas.ts
.
Parameters:
- url (string, required): The original URL that was analyzed
- domain (string, required): The domain to filter requests for (e.g., "api.example.com")
Example input:
{
"url": "https://example.com",
"domain": "api.example.com"
}
Example output:
{
"url": "https://example.com",
"domain": "api.example.com",
"totalRequests": 3,
"requests": [
{
"id": "req-1",
"url": "https://api.example.com/v1/data",
"method": "GET",
"resourceType": "xhr",
"status": 200,
"timestamp": "2025-08-15T18:00:00.200Z"
},
{
"id": "req-2",
"url": "https://api.example.com/v1/auth",
"method": "POST",
"resourceType": "xhr",
"status": 201,
"timestamp": "2025-08-15T18:00:00.450Z"
}
]
}
- get_request_details
Description: Return full details for a single captured request (headers, body if captured, response, timings). Requires both the analyzed URL and the requestId present in the stored analysis. See the tool registration in
src/server.ts
.
Parameters:
- url (string, required): The original URL that was analyzed
- requestId (string, required): The unique ID of the request to retrieve
Example input:
{
"url": "https://example.com",
"requestId": "req-2"
}
Example output:
{
"id": "req-2",
"url": "https://api.example.com/v1/auth",
"method": "POST",
"requestHeaders": {
"content-type": "application/json"
},
"requestBody": "{\"username\":\"user\",\"password\":\"***\"}",
"status": 201,
"responseHeaders": {
"content-type": "application/json"
},
"responseBody": "{\"token\":\"abc123\"}",
"resourceType": "xhr",
"timestamp": "2025-08-15T18:00:00.450Z",
"timings": {
"start": 1692136800.45,
"end": 1692136800.47,
"durationMs": 20
}
}
- get_request_summary
Description: Return the stored analysis summary for a previously analyzed URL. The handler returns the same summary object produced by
analyze_website_requests
. Seesrc/handlers/analysis.ts
.
Parameters:
- url (string, required): The website URL that was previously analyzed
Example input:
{
"url": "https://example.com"
}
Example output:
{
"websiteInfo": {
"url": "https://example.com",
"title": "Example Domain",
"analysisTimestamp": "2025-08-15T18:00:00.000Z",
"renderMethod": "client"
},
"requestSummary": {
"totalRequests": 12,
"uniqueDomains": 4,
"requestsByType": {
"document": 1,
"script": 6,
"stylesheet": 2,
"xhr": 2,
"font": 1
}
},
"domains": [
"example.com",
"cdn.example.com",
"analytics.google.com"
],
"antiBotDetection": {
"detected": false
},
"browserStorage": {
"cookies": [
{
"name": "sessionid",
"value": "abc123",
"domain": "example.com",
"path": "/",
"expires": 1692136800,
"httpOnly": true,
"secure": true,
"sameSite": "Lax"
}
],
"localStorage": {
"theme": "dark",
"pref": "1"
},
"sessionStorage": {
"lastVisited": "/home"
}
}
}
- extract_html_elements
Description: Extracts important HTML elements (text, images, links, scripts) with their CSS selectors and basic metadata. Uses page-level extraction logic in
src/services/page_analyzer.ts
.
Parameters:
- url (string, required): The page URL to analyze
- filterType (string, required): One of:
text
,image
,link
,script
Example input:
{
"url": "https://example.com",
"filterType": "text"
}
Example output:
[
{
"content": "Hello world",
"selector": "#intro",
"type": "text",
"tag": "p",
"attributes": {
"id": "intro",
"data-test": "x"
}
},
{
"content": "Title here",
"selector": ".title",
"type": "text",
"tag": "h1",
"attributes": {}
}
]
- fetch
Description: Perform a direct HTTP request (no browser rendering) and return status, headers and body. The schema is defined in
src/handlers/schemas.ts
.
Parameters:
- url (string, required): The URL to fetch
- method (string, optional): HTTP method, one of GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS (default: GET)
- headers (object, optional): Key-value map of request headers
- body (string, optional): Request body for POST/PUT/PATCH
Example input:
{
"url": "https://api.example.com/data",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"body": "{\"key\":\"value\"}"
}
Example output:
{
"status": 200,
"statusText": "OK",
"headers": {
"content-type": "application/json"
},
"body": "{\"result\":\"ok\"}",
"durationMs": 123
}
Add this to your Claude Desktop MCP settings:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"web-scraper-analytics": {
"command": "bun",
"args": ["/path/to/mcp_scraper_analytics/src/index.ts"],
"disabled": false
}
}
}
Edit ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
:
{
"mcpServers": {
"web-scraper-analytics": {
"command": "bun",
"args": ["/path/to/mcp_scraper_analytics/src/index.ts"],
"disabled": false
"autoApprove": []
}
}
}
- Identify all third-party services a website connects to
- Discover potential data leakage through external requests
- Track analytics and advertising integrations
- Analyze request patterns and timing
- Identify heavy resource loading
- Find opportunities for caching and optimization
- Understand what services competitors use
- Discover third-party integrations and tools
- Analyze traffic patterns and dependencies
- Track external data sharing
- Identify GDPR/privacy compliance issues
- Document all network communications
- Debug API calls and network issues
- Analyze timing and performance problems
- Understand complex application architectures
"Analyze the HTTP requests made by https://techcrunch.com and show me all the advertising and analytics services they use"
"Check what third-party services https://shopify.com loads and organize them by category"
"Analyze https://github.com and tell me about their performance characteristics based on the network requests"
"Extract all text and image elements from https://example.com and return their CSS selectors and brief content summaries"
- Browser Engine: Chromium via Patchright
- Request Capture: Real browser automation with full JavaScript execution
- Network Monitoring: Captures both requests and responses
- Resource Types: Documents, scripts, stylesheets, images, fonts, XHR, fetch, and more
- Timeout Handling: Configurable timeouts for different scenarios
- Error Handling: Comprehensive error handling and logging
# Build the project
bun run build
# Run in development mode (with auto-rebuild and restart)
bun run dev
# Test with MCP Inspector
bun run mcp-test
# Run unit tests
bun test
# Run tests with coverage
bun run test:coverage
# Run tests in watch mode (for development)
bun run test:watch
# Start production server
bun run start
# Clean build artifacts
bun run clean
The project includes comprehensive Jest unit tests for the core analyzer functionality.
- 66.19% Statement Coverage for analyzer.ts
- 41.17% Branch Coverage for analyzer.ts
- 46.15% Function Coverage for analyzer.ts
- ✅ URL Validation: Tests for proper URL format validation
- ✅ Error Handling: Comprehensive error handling tests
- ✅ Configuration Options: Tests for all analysis options (waitTime, quickMode, includeImages)
- ✅ Request Monitoring: Tests for HTTP request capture and filtering
- ✅ Browser Integration: Mocked browser interactions
- ✅ Network Timeouts: Graceful handling of network timeouts
- ✅ Response Processing: Tests for response body capture and truncation
- ✅ Domain Analysis: Tests for domain extraction and analysis
# Run all tests
bun test
# Run tests with coverage report
bun run test:coverage
# Run tests in watch mode (for development)
bun run test:watch
src/
├── index.ts # Main MCP server implementation
package.json # Dependencies and scripts
tsconfig.json # TypeScript configuration
README.md # This file
- Fork the repository
- Create a feature branch
- Make your changes
- Test with the MCP inspector
- Submit a pull request
This project is licensed under the MIT License.
-
Browser Launch Fails
- Ensure you have proper permissions for browser automation
- On macOS, you might need to grant accessibility permissions
-
Requests Not Captured
- Some requests might be made before the event listeners are attached
- Try increasing the
waitTime
parameter - Check if the site uses complex async loading
-
Memory Issues
- Large sites with many requests might consume significant memory
- Consider filtering out images and media if not needed
- Close browser contexts properly
Enable verbose logging by setting the environment variable:
DEBUG=1 bun run dev
- Add request/response body capture (for POST requests)
- Implement result caching for repeated analyses
- Add support for custom browser configurations
- Implement request filtering by domain/type
- Add export formats (CSV, JSON, XML)
- Implement screenshot capture alongside request analysis