FastMCP server for Unstructured.io document parsing.
This MCP server provides tools for parsing documents (PDF, DOCX, TXT, logs, etc.) using Unstructured.io API. It connects Claude to document parsing capabilities via the Model Context Protocol.
- Python >= 3.10
- Unstructured API container running on localhost:9104
docker run -d --name unstructured-api -p 9104:8000 \
quay.io/unstructured-io/unstructured-api:latestOr using docker-compose (recommended):
cd brainery-containers
docker-compose up -d unstructured-apinpx -y @smithery/cli install @tapiocapioca/unstructured-mcp-server --client claudeThis will:
- Install the package
- Configure Claude Desktop automatically
- Set up the MCP server
pip install unstructured-mcp-serverOr with uv:
uv pip install unstructured-mcp-serverThen add to your MCP settings:
{
"mcpServers": {
"unstructured": {
"command": "unstructured-mcp",
"env": {
"UNSTRUCTURED_API_URL": "http://localhost:9104"
}
}
}
}unstructured-mcpSet environment variables (optional):
export UNSTRUCTURED_API_URL=http://localhost:9104| Tool | Description |
|---|---|
parse_document |
Parse a single document (PDF, DOCX, TXT, logs) and extract text |
parse_batch |
Parse multiple documents in batch |
parse_document(file_path: str, strategy: str = "auto")Parameters:
file_path: Absolute path to file on host machinestrategy: Parsing strategy -auto,fast,hi_res, orocr_only(default:auto)
Returns:
text: Extracted text contentelements_count: Number of document elementsfile_type: Detected file typeprocessing_time_ms: Processing time in milliseconds
parse_batch(file_paths: list[str])Parameters:
file_paths: Array of absolute paths to files
Returns:
- List of results, one per file (success or error)
Documents (via Unstructured API):
- PDF, DOCX, PPTX, XLSX, ODT, RTF, EPUB
Text Files (direct read):
.txt,.log,.md,.csv,.json,.yaml,.yml.py,.js,.ts,.java,.c,.cpp,.sh,.bash.conf,.cfg,.ini,.properties,.env,.sql.xml,.html,.css,.gitignore,.htaccess
MIT