You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+36-4Lines changed: 36 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,9 +20,27 @@ A production-ready [Model Context Protocol](https://modelcontextprotocol.io/intr
20
20
21
21
The server provides the following enterprise-ready tools:
22
22
23
+
### Core Scraping Tools
24
+
23
25
-`markdownify(website_url: str)`: Transform any webpage into clean, structured markdown format
24
-
-`smartscraper(user_prompt: str, website_url: str)`: Leverage AI to extract structured data from any webpage
25
-
-`searchscraper(user_prompt: str)`: Execute AI-powered web searches with structured, actionable results
26
+
-`smartscraper(user_prompt: str, website_url: str, number_of_scrolls: int = None, markdown_only: bool = None)`: Leverage AI to extract structured data from any webpage with support for infinite scrolling
27
+
-`searchscraper(user_prompt: str, num_results: int = None, number_of_scrolls: int = None)`: Execute AI-powered web searches with structured, actionable results
28
+
29
+
### Advanced Scraping Tools
30
+
31
+
-`scrape(website_url: str, render_heavy_js: bool = None)`: Basic scraping endpoint to fetch page content with optional heavy JavaScript rendering
32
+
-`sitemap(website_url: str)`: Extract sitemap URLs and structure for any website
33
+
34
+
### Multi-Page Crawling
35
+
36
+
-`smartcrawler_initiate(url: str, prompt: str = None, extraction_mode: str = "ai", depth: int = None, max_pages: int = None, same_domain_only: bool = None)`: Initiate intelligent multi-page web crawling with two modes:
37
+
-**AI Extraction Mode** (10 credits per page): Extracts structured data based on your prompt
38
+
-**Markdown Conversion Mode** (2 credits per page): Converts pages to clean markdown
39
+
-`smartcrawler_fetch_results(request_id: str)`: Retrieve results from asynchronous crawling operations
40
+
41
+
### Intelligent Agent-Based Scraping
42
+
43
+
-`agentic_scrapper(url: str, user_prompt: str = None, output_schema: dict = None, steps: list = None, ai_extraction: bool = None, persistent_session: bool = None, timeout_seconds: float = None)`: Run advanced agentic scraping workflows with customizable steps and structured output schemas
26
44
27
45
## Setup Instructions
28
46
@@ -77,11 +95,25 @@ Add the ScrapeGraphAI MCP server on the settings:
77
95
78
96
The server enables sophisticated queries such as:
79
97
98
+
### Single Page Scraping
80
99
- "Analyze and extract the main features of the ScapeGraph API"
81
100
- "Generate a structured markdown version of the ScapeGraph homepage"
82
-
- "Extract and analyze pricing information from the ScapeGraph website"
101
+
- "Extract and analyze pricing information from the ScapeGraph website with infinite scroll support"
102
+
- "Scrape this JavaScript-heavy page with full rendering"
103
+
104
+
### Search and Research
83
105
- "Research and summarize recent developments in AI-powered web scraping"
84
-
- "Create a comprehensive summary of the Python documentation website"
106
+
- "Search for the top 5 articles about machine learning frameworks and extract key points"
107
+
108
+
### Multi-Page Crawling
109
+
- "Crawl the entire documentation site and convert all pages to markdown"
110
+
- "Extract all product information from an e-commerce site up to 3 levels deep"
111
+
- "Crawl a blog and extract all article titles, authors, and summaries"
112
+
113
+
### Advanced Agentic Scraping
114
+
- "Navigate through a multi-step form and extract the final results"
115
+
- "Follow pagination links and compile a complete dataset"
116
+
- "Execute a complex workflow with custom extraction schema"
0 commit comments