Implement Core Agent Manager Service #536

AtlantisPleb · 2025-01-11T20:37:21Z

Overview

Implement the core agent manager service as part of the long-running agents infrastructure (from #517). This focuses on Phase 1 core infrastructure, specifically the agent manager service and basic state persistence.

Implementation Plan

1. Database Schema

Add new tables for agent management:

-- Agent table for storing agent instances
CREATE TABLE agents (
    id UUID PRIMARY KEY,
    pubkey TEXT NOT NULL,
    name TEXT NOT NULL,
    status TEXT NOT NULL, -- running, stopped, paused
    config JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT valid_pubkey CHECK (length(pubkey) = 64)
);

-- Agent state table for persistent state
CREATE TABLE agent_states (
    agent_id UUID REFERENCES agents(id),
    state_key TEXT NOT NULL,
    state_value JSONB NOT NULL,
    updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
    PRIMARY KEY (agent_id, state_key)
);

CREATE INDEX idx_agents_pubkey ON agents(pubkey);
CREATE INDEX idx_agents_status ON agents(status);

2. Agent Manager Service Implementation

Create new files:

src/agent/mod.rs: Core agent module
src/agent/manager.rs: Agent lifecycle management
src/agent/state.rs: State persistence
src/agent/types.rs: Type definitions

Key components:

Agent Types

// src/agent/types.rs
pub struct Agent {
    pub id: Uuid,
    pub pubkey: String,
    pub name: String,
    pub status: AgentStatus,
    pub config: serde_json::Value,
}

pub enum AgentStatus {
    Running,
    Stopped,
    Paused,
}

Agent Manager

// src/agent/manager.rs
pub struct AgentManager {
    db: Database,
}

impl AgentManager {
    pub async fn create_agent(&self, pubkey: &str, name: &str, config: serde_json::Value) -> Result<Agent>;
    pub async fn start_agent(&self, id: Uuid) -> Result<()>;
    pub async fn stop_agent(&self, id: Uuid) -> Result<()>;
    pub async fn get_agent_status(&self, id: Uuid) -> Result<AgentStatus>;
}

State Management

// src/agent/state.rs
pub struct AgentState {
    db: Database,
}

impl AgentState {
    pub async fn set_state(&self, agent_id: Uuid, key: &str, value: serde_json::Value) -> Result<()>;
    pub async fn get_state(&self, agent_id: Uuid, key: &str) -> Result<Option<serde_json::Value>>;
}

3. API Endpoints

Add new routes in src/server/routes.rs:

pub fn agent_routes() -> Router {
    Router::new()
        .route("/agents", post(create_agent))
        .route("/agents/:id/start", post(start_agent))
        .route("/agents/:id/stop", post(stop_agent))
        .route("/agents/:id/status", get(get_agent_status))
}

4. Nostr Event Integration

Use existing events table to store agent-related events:

Agent Status Updates (Kind: 30001)
Agent Control Messages (Kind: 20001)

Testing Plan

Unit Tests

Agent creation/deletion
State persistence
Status transitions
Configuration validation

Integration Tests

API endpoint testing
Database operations
Event persistence
State management

Acceptance Criteria

Agent manager can create and manage agent instances
Agents maintain persistent state between restarts
Basic API endpoints work correctly
Agent status updates are stored as Nostr events
All tests pass

Next Steps After Completion

Implement task queue system
Add more sophisticated state management
Implement resource monitoring
Add error recovery mechanisms

Related: #517

The text was updated successfully, but these errors were encountered:

AtlantisPleb · 2025-01-12T01:51:18Z

Additional Agent Type Properties to Consider

Looking at the current Agent struct, we might want to add:

Version Information

// Version of this agent's implementation
pub version: String,
// Minimum platform version required to run this agent
pub min_platform_version: String,

Resource Constraints

// Maximum memory allocation allowed (in MB)
pub memory_limit: u32,
// Maximum CPU time allowed (in ms)
pub cpu_limit: u32,
// Maximum concurrent instances allowed
pub max_instances: u32,

Security & Permissions

// List of capabilities this agent requires
pub capabilities: Vec<String>,
// Access control list for who can create instances
pub allowed_users: Vec<String>,

Dependencies & Integration

// Other agent types this agent depends on
pub dependencies: Vec<Uuid>,
// External service integrations required
pub required_integrations: Vec<String>,

Billing & Usage

// Cost per minute of runtime
pub cost_per_minute: f64,
// Whether this is a system agent or user-created
pub is_system: bool,

These additions would help with:

Resource management and billing
Security and access control
Version compatibility
System stability
Integration management

Would be good to discuss which of these we want to include in the initial implementation.

(Comment from OpenAgents)

AtlantisPleb · 2025-01-12T01:51:55Z

Current Type Structure Summary

We currently have these core types defined:

Agent (Template/Definition)

pub struct Agent {
    pub id: Uuid,
    pub name: String,
    pub description: String,
    pub pubkey: String,
    pub enabled: bool,
    pub config: serde_json::Value,
    pub created_at: i64,
}

Represents the base definition/template of an agent type.

AgentInstance (Running Instance)

pub struct AgentInstance {
    pub id: Uuid,
    pub agent_id: Uuid,
    pub status: InstanceStatus,
    pub created_at: i64,
    pub ended_at: Option<i64>,
}

Represents a specific running instance of an agent template.

Plan (High-level Objective)

pub struct Plan {
    pub id: Uuid,
    pub agent_id: Uuid,
    pub name: String,
    pub description: String,
    pub status: PlanStatus,
    pub task_ids: Vec<Uuid>,
    pub created_at: i64,
    pub ended_at: Option<i64>,
    pub metadata: serde_json::Value,
}

Represents a high-level objective with ordered tasks.

Task (Individual Action)

pub struct Task {
    pub id: Uuid,
    pub plan_id: Uuid,
    pub instance_id: Uuid,
    pub task_type: String,
    pub status: TaskStatus,
    pub priority: u8,
    pub input: serde_json::Value,
    pub output: Option<serde_json::Value>,
    pub created_at: i64,
    pub started_at: Option<i64>,
    pub ended_at: Option<i64>,
    pub error: Option<String>,
}

Represents a specific action/task to be performed.

Status Enums

pub enum InstanceStatus {
    Starting, Running, Paused, Stopping, Stopped, Error
}

pub enum PlanStatus {
    Created, InProgress, Completed, Failed, Cancelled
}

pub enum TaskStatus {
    Pending, Scheduled, Running, Completed, Failed, Cancelled
}

Key Relationships:

Agent (1) -> AgentInstance (Many)
AgentInstance (1) -> Plan (Many)
Plan (1) -> Task (Many)

This structure allows for:

Multiple instances of the same agent type
Each instance managing multiple plans
Each plan containing multiple ordered tasks
Detailed status tracking at each level
Flexible JSON configs/data at multiple levels
Complete audit trail with timestamps

(Comment from OpenAgents)

AtlantisPleb · 2025-01-12T03:49:20Z

I've started implementing the testing infrastructure for the agent manager service. Specifically:

Organized all agent-related tests into a modular structure under tests/agent/:
- core.rs: Tests for basic Agent/AgentInstance/Plan/Task types and their behaviors
- manager.rs: Tests for agent lifecycle management including state transitions
- nostr.rs: Tests for Nostr event integration (status updates, control messages)
Set up proper test organization for nostr integration:
- Added tests/nostr/mod.rs to properly expose test modules
- Organized database, event and subscription tests

The test suite now covers key areas outlined in the testing plan:

Agent creation/deletion
State persistence
Status transitions
Configuration validation
Event persistence
Basic API endpoint testing

All 20 tests are passing, providing good coverage of the core agent functionality. This modular test structure will make
easier to add more tests as we implement additional features like task queues and resource monitoring.

Next steps:

Add integration tests for the database schema once implemented
Add more API endpoint tests
Expand state management test coverage

The changes are in commits 0134941 and d5a1285.

AtlantisPleb · 2025-01-12T03:51:08Z

Looking at the test structure and types, I think the AgentManager should be implemented as a central service that:

Lifecycle Management

Creates agent instances from templates
Handles start/stop/pause operations
Manages state transitions
Monitors health and handles crashes

State Management

Persists agent state to database
Handles state recovery on restart
Manages configuration updates
Provides state query capabilities

Resource Management

Enforces memory/CPU limits
Tracks resource usage
Handles resource allocation
Implements rate limiting

Event Integration

Emits Nostr events for status changes
Handles control messages
Maintains event history
Provides event query interface

Proposed implementation:

pub struct AgentManager {
    // Core dependencies
    db: Database,
    event_bus: EventBus,

    // Internal state
    instances: HashMap<Uuid, AgentInstance>,
    state_cache: LruCache<(Uuid, String), serde_json::Value>,

    // Resource tracking
    resource_monitor: ResourceMonitor,
    rate_limiter: RateLimiter,
}

impl AgentManager {
    // Lifecycle methods
    async fn create_instance(&mut self, template: Agent) -> Result<Uuid>;
    async fn start_instance(&mut self, id: Uuid) -> Result<()>;
    async fn stop_instance(&mut self, id: Uuid) -> Result<()>;

    // State methods
    async fn get_state(&self, id: Uuid, key: &str) -> Result<Option<serde_json::Value>>;
    async fn set_state(&mut self, id: Uuid, key: &str, value: serde_json::Value) -> Result<()>;

    // Resource methods
    async fn check_resources(&self, id: Uuid) -> Result<ResourceMetrics>;
    async fn enforce_limits(&mut self) -> Result<()>;

    // Event methods
    async fn emit_status(&self, id: Uuid, status: InstanceStatus) -> Result<()>;
    async fn handle_control(&mut self, event: ControlEvent) -> Result<()>;
}

AtlantisPleb · 2025-01-12T04:44:35Z

We've made significant progress on the testing infrastructure for the Agent Manager service:

Implemented and tested core functionality in MockAgentManager:

Agent lifecycle management (creation, status updates)
Instance limits and validation
State persistence and recovery
Task status transitions
Error handling and recovery flows

Test coverage now includes:

Basic agent lifecycle
Resource limit enforcement
Error recovery mechanisms
State management with validation
Task lifecycle with proper timestamps

All tests are passing (5 comprehensive test cases):

test_agent_lifecycle
test_instance_limits
test_error_recovery
test_instance_state_management
test_task_status_transitions

Next implementation steps:

Create database migration for agent/state tables
Implement actual AgentManager service using MockAgentManager as spec
Add API endpoints
Integrate Nostr event system

The mock implementation and tests provide a solid foundation for the actual AgentManager service, validating the core
concepts from the issue description. The test structure ensures we maintain the key requirements around:

Instance management
State persistence
Status transitions
Resource limits
Error recovery

Related commits:

34b473a: test: Add comprehensive tests for MockAgentManager lifecycle and state management
697911d: fix: Prevent updating state for non-existent agent instances

Next PR will focus on the database schema and actual AgentManager implementation.

AtlantisPleb · 2025-01-12T04:49:43Z

Test infrastructure is now complete in PR #541. Next steps to check off remaining items:

Implement actual AgentManager service:
- Move from MockAgentManager to real implementation
- Implement database operations using schema from this issue
- Add proper error handling and validation
- Implement resource monitoring and limits
Add state persistence:
- Implement database operations for agent state
- Add state recovery on agent restart
- Implement state validation and migration
- Add state backup/restore functionality
Create API endpoints:
- Implement routes defined in issue description
- Add request validation
- Add error handling
- Add response serialization
- Add API tests
Integrate Nostr events:
- Implement event emission for status changes
- Add event persistence
- Set up subscription handling
- Add event validation

The test infrastructure provides a clear specification for how each of these components should behave. We can implement them one at a time, using the tests as a guide.

(Comment from OpenAgents)

AtlantisPleb · 2025-01-12T05:24:05Z

Test Infrastructure Implementation Summary

We've implemented comprehensive test coverage for the Agent Manager service across multiple test files:

Core Agent Tests (tests/agent/manager_impl.rs)

test_agent_creation_and_validation: Validates agent creation with proper config
test_instance_lifecycle: Tests instance creation, status updates, state management
test_instance_limits: Verifies enforcement of max instance limits
test_resource_monitoring: Checks memory/CPU limit enforcement
test_state_management: Validates state persistence and retrieval

Comprehensive Tests (tests/agent/manager_comprehensive.rs)

test_agent_validation_errors: Validates config constraints (memory, CPU limits)
test_instance_state_persistence: Tests state persistence across restarts
test_resource_monitoring_and_limits: Comprehensive resource limit testing
test_cache_consistency: Validates state/metrics cache behavior
test_edge_cases: Handles disabled agents, invalid states, etc.

The test suite validates:

Agent lifecycle (creation, status transitions, deletion)
Resource management (memory/CPU limits, instance counts)
State persistence (save/load state, validation)
Error handling (invalid configs, resource limits)
Cache consistency (state/metrics synchronization)

All tests are passing and provide a solid specification for implementing the actual AgentManager service. Next steps are
implementing the real service components against these test specifications.

Related PR: #541

* Create file migrations/20250112002000_create_agent_tables.sql * Create file src/agents/manager.rs * Add `manager` module and export `AgentManager` in `mod.rs` to enhance agent management capabilities. * Create file tests/agent/manager_impl.rs * Create file tests/agent/mod.rs * No changes detected in Cargo.toml, commit unnecessary. * Refactor `manager.rs`: Replace `db` with `pool` in `AgentManager`, simplify imports, and update instance management using `PgPool`. * Refactor `setup_test_db` to return `PgPool` directly and update test setup for `AgentManager` in `manager_impl.rs`. * cargolock * Add `time::OffsetDateTime` import for enhanced time handling in `manager.rs`. * Refactor `manager.rs` to remove unused imports and clarify `AgentManager` structure for better maintainability. * Refactor `manager.rs` to remove unused imports and streamline code readability. * Remove MAX_RETRIES constant from manager.rs for simplification. * Create file tests/agent/manager_comprehensive.rs * Added `manager_comprehensive` module to enhance manager functionalities in `mod.rs`. * No changes detected in mod.rs file content, commit unnecessary. * Add `AgentManager` struct to manage agent instances and their states. * Made all modules in `mod.rs` public to enable external access. * Added manager_impl and manager_comprehensive modules to enhance manager functionality in agent.rs. * refactor: Update test files to use public APIs and remove unused imports * fix: Update test methods to use correct AgentManager methods and handle pool ownership * fix: Clone pool to resolve ownership issue in agent manager tests * fix: Clone pool to resolve ownership issue in agent manager tests * fix: Clone pool to resolve ownership issue in agent manager test * fix: Clone pool to resolve ownership issue in agent manager test * fix: Clone pool to resolve ownership issue in agent manager test * fix: Clone database pool to resolve ownership issue in test * fix: Clone pool to resolve ownership issue in test * fix: Clone pool in test files to resolve borrow of moved value

AtlantisPleb · 2025-01-12T05:29:09Z

I've implemented Phase 1 of the AgentManager service in PR #542, focusing on the core functionality and state management.

Implemented Features

✅ Core Agent Management:

Full lifecycle management (create, start, stop)
State persistence and recovery
Resource monitoring and limits
Error handling and validation
Cache consistency

✅ Database Schema:

Agent tables for storing agent definitions and instances
State persistence tables
Resource monitoring tables
Proper indexing and constraints

✅ Test Coverage:

Core functionality tests
Comprehensive edge case testing
Resource monitoring tests
State persistence tests
Cache consistency tests

Acceptance Criteria Status

Current status of the acceptance criteria:

✅ Agent manager can create and manage agent instances
✅ Agents maintain persistent state between restarts
❌ Basic API endpoints work correctly (coming in next PR)
❌ Agent status updates are stored as Nostr events (coming in next PR)
✅ All tests pass (for implemented features)

Next Steps

The remaining work will be split into separate PRs:

API Integration:
- Implement routes defined in issue description
- Add request validation
- Add error handling
- Add response serialization
- Add API tests
Nostr Integration:
- Implement event emission for status changes
- Add event persistence
- Set up subscription handling
- Add event validation

PR #542 provides the foundation for these next steps by implementing the core functionality and data model.

(Comment from OpenAgents)

AtlantisPleb mentioned this issue Jan 12, 2025

Add agent manager tests and mock implementation #541

Merged

AtlantisPleb mentioned this issue Jan 12, 2025

Implement Core Agent Manager Service - Phase 1 (#536) #542

Merged

AtlantisPleb mentioned this issue Jan 12, 2025

Implement Agent Creation Frontend with Nostr Integration #543

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Core Agent Manager Service #536

Implement Core Agent Manager Service #536

AtlantisPleb commented Jan 11, 2025 •

edited

Loading

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

Implement Core Agent Manager Service #536

Implement Core Agent Manager Service #536

Comments

AtlantisPleb commented Jan 11, 2025 • edited Loading

Overview

Implementation Plan

1. Database Schema

2. Agent Manager Service Implementation

3. API Endpoints

4. Nostr Event Integration

Testing Plan

Acceptance Criteria

Next Steps After Completion

AtlantisPleb commented Jan 12, 2025

Additional Agent Type Properties to Consider

AtlantisPleb commented Jan 12, 2025

Current Type Structure Summary

Agent (Template/Definition)

AgentInstance (Running Instance)

Plan (High-level Objective)

Task (Individual Action)

Status Enums

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

AtlantisPleb commented Jan 12, 2025

Test Infrastructure Implementation Summary

Core Agent Tests (tests/agent/manager_impl.rs)

Comprehensive Tests (tests/agent/manager_comprehensive.rs)

AtlantisPleb commented Jan 12, 2025

Implemented Features

Acceptance Criteria Status

Next Steps

AtlantisPleb commented Jan 11, 2025 •

edited

Loading