AgenticOps is a scalable multi-agent system for large-scale PDF analysis. It uses a hierarchical mapper–reducer architecture powered by Ray, MongoDB, and LLM-driven agents to deliver fast, reliable, and consistent document understanding.
Distributed Multi-Agent Architecture
- Multi-tier agent architecture with Master, SubMasters, Workers, and Residual Agent
- Parallel page-level processing with coordinated global context
- Automatic retries, lineage-based recovery, and checkpointing
- Dynamic resource allocation based on document complexity
- Real-time monitoring using an event-driven system
- Final JSON and PDF report generation with structured insights
The system uses a distributed pipeline where:
- The Master Agent analyzes metadata and generates the execution plan
- SubMasters handle page-range processing and worker pools
- Workers perform LLM-based extraction
- The Residual Agent maintains global context and validates quality
- Merger Supervisor synthesizes final outputs
- Extracts PDF metadata
- Generates execution plan using Mistral
- Allocates SubMasters and manages approval workflow
- Persists planning data to MongoDB
- Spawn worker pools
- Handle page extraction and distribution
- Aggregate worker outputs at section level
- Report progress to orchestrator
- Process pages with LLM prompts using global context
- Extract entities, keywords, and summaries
- Use rate limiting and retry logic for stability
- Generates and distributes global context
- Validates output quality
- Handles anomaly detection and targeted retries
- Consolidates section-level outputs
- Resolves conflicts and unifies narrative
- Prepares final synthesis for reporting
AgenticOps follows an eight-stage workflow:
- Metadata extraction
- Master planning
- SubMaster initialization
- Worker pool creation
- Parallel mapper execution
- Quality validation
- Hierarchical reduction
- Report generation (JSON and PDF)
- Worker retries with exponential backoff
- SubMaster checkpoint recovery from MongoDB
- Backup Master failover
- Global rate limiter preventing overload
- Automatic replays using Ray lineage
- Ray Distributed Framework
- Mistral LLM
- MongoDB
- pdfplumber and PyPDF2
- ReportLab
- Python 3.10+
- Ray installed and running
- MongoDB instance
- Mistral API key
npm run dev
python run_api.pyOutputs are stored as:
/output/report.json/output/report.pdf
- Adhiraj Singh
- Dev Rishi Verma
- Nishant Raj
.jpg)