Skip to content

Build testing and validation framework for SDK workflows #10160

@Mossaka

Description

@Mossaka

Description

Create comprehensive testing infrastructure for SDK workflows, including unit tests, integration tests, and validation tools.

Part of Epic

#10154 - Copilot SDK Integration for Advanced Agentic Workflows

Testing Requirements

1. Unit Tests

Test SDK engine compilation logic in isolation:

Test Areas:

  • Frontmatter parsing for SDK configuration
  • Session config generation
  • Inline tool compilation
  • Event handler generation
  • Multi-agent configuration
  • Backward compatibility with CLI mode

Test Files:

pkg/workflow/copilot_sdk_engine_test.go
pkg/workflow/copilot_sdk_session_test.go
pkg/workflow/copilot_sdk_tools_test.go
pkg/workflow/copilot_sdk_events_test.go

2. Integration Tests

Test generated workflows in actual GitHub Actions environment:

Test Workflows:

pkg/cli/workflows/
├── test-sdk-single-turn.md
├── test-sdk-multi-turn.md
├── test-sdk-custom-tools.md
├── test-sdk-event-handlers.md
├── test-sdk-multi-agent.md
└── test-sdk-migration.md

3. Comparison Tests

Compare SDK vs CLI for equivalent workflows:

Comparison Dimensions:

  • Functional correctness
  • Performance (latency, token usage)
  • Cost (tokens, API calls)
  • Reliability (success rate, error handling)
  • Observability (logs, metrics)

4. Validation Tools

SDK Compatibility Checker:

gh aw check-sdk-compatibility workflow.md

# Output:
✅ Workflow is compatible with SDK mode
⚠️  Custom bash tools recommended for conversion to inline tools
💡 Consider enabling session persistence for multi-turn

SDK Migration Tool:

gh aw migrate-to-sdk workflow.md

# Output:
✅ Migrated workflow.md to SDK mode
📝 Created workflow.sdk.md
📊 Compatibility report: workflow-migration-report.md

SDK Validator:

gh aw validate-sdk workflow.md

# Validates:
- Session configuration
- Inline tool syntax
- Event handler registration
- Multi-agent setup
- Resource limits

Test Implementation

Unit Test Example

func TestCopilotSDKEngine_CompileSession(t *testing.T) {
    tests := []struct {
        name   string
        config SessionConfig
        want   string
        err    bool
    }{
        {
            name: "basic session config",
            config: SessionConfig{
                Persistent: true,
                Storage:    "artifacts",
                MaxTurns:   10,
            },
            want: "session-config.json",
            err:  false,
        },
        // More test cases...
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            engine := NewCopilotSDKEngine()
            got, err := engine.CompileSessionConfig(&tt.config)
            
            if (err != nil) != tt.err {
                t.Errorf("CompileSessionConfig() error = %v, wantErr %v", err, tt.err)
            }
            // Assertions...
        })
    }
}

Integration Test Example

---
# test-sdk-multi-turn.md
engine:
  id: copilot
  mode: sdk
  session:
    persistent: true
    max-turns: 3
tools:
  github:
    allowed: [issue_read]
---

# Multi-Turn Test

Turn 1: What is the issue title?
Turn 2: Summarize the issue description.
Turn 3: Suggest next steps.

Test Validation:

func TestSDKMultiTurn(t *testing.T) {
    // Compile workflow
    workflow := compileWorkflow("test-sdk-multi-turn.md")
    
    // Run in test environment
    result := runWorkflow(workflow)
    
    // Validate
    assert.Equal(t, 3, result.Metrics.Turns)
    assert.True(t, result.SessionPersisted)
    assert.NotEmpty(t, result.Outputs)
}

Comparison Test Example

func TestSDKvsCLI_Performance(t *testing.T) {
    prompt := "Analyze code quality"
    
    // Run with CLI
    cliStart := time.Now()
    cliResult := runCLI(prompt)
    cliDuration := time.Since(cliStart)
    
    // Run with SDK
    sdkStart := time.Now()
    sdkResult := runSDK(prompt)
    sdkDuration := time.Since(sdkStart)
    
    // Compare results
    assert.Equal(t, cliResult.Output, sdkResult.Output)
    
    // Log performance metrics
    t.Logf("CLI: %v, SDK: %v", cliDuration, sdkDuration)
    t.Logf("CLI tokens: %d, SDK tokens: %d", cliResult.Tokens, sdkResult.Tokens)
}

Validation Tools Implementation

Compatibility Checker

type CompatibilityChecker struct {
    workflow *Workflow
}

func (c *CompatibilityChecker) Check() (*CompatibilityReport, error) {
    report := &CompatibilityReport{
        Compatible: true,
        Warnings:   []string{},
        Suggestions: []string{},
    }
    
    // Check for CLI-specific features
    if c.workflow.HasBashTools() {
        report.Warnings = append(report.Warnings,
            "Custom bash tools could be converted to inline tools")
        report.Suggestions = append(report.Suggestions,
            "Consider using SDK inline tools for better integration")
    }
    
    // Check for multi-turn patterns
    if c.workflow.HasMultiTurnPattern() {
        report.Suggestions = append(report.Suggestions,
            "Enable session persistence for multi-turn conversations")
    }
    
    return report, nil
}

Migration Tool

func MigrateToSDK(workflowPath string) error {
    // Parse existing workflow
    workflow, err := parseWorkflow(workflowPath)
    if err != nil {
        return err
    }
    
    // Convert frontmatter
    workflow.Engine.Mode = "sdk"
    
    // Add session config if multi-turn detected
    if detectMultiTurn(workflow) {
        workflow.Engine.Session = &SessionConfig{
            Persistent: true,
            MaxTurns:   10,
        }
    }
    
    // Convert bash tools to inline tools
    for _, bashTool := range workflow.Tools.Bash {
        inlineTool := convertToInlineTool(bashTool)
        workflow.Tools.Inline = append(workflow.Tools.Inline, inlineTool)
    }
    
    // Write migrated workflow
    return writeWorkflow(workflow, workflowPath+".sdk.md")
}

Implementation Tasks

  • Create unit test suite for SDK engine
  • Add integration tests for SDK workflows
  • Implement comparison test framework
  • Build compatibility checker tool
  • Build migration tool
  • Create SDK validator
  • Add performance benchmarks
  • Create test data generators
  • Add CI/CD integration for SDK tests
  • Document testing best practices

Success Criteria

  • >80% code coverage for SDK engine
  • All integration tests passing
  • Comparison tests show SDK functional parity
  • Validation tools working and documented
  • Migration tool successfully converts example workflows
  • Performance benchmarks established
  • CI/CD running SDK tests automatically

Performance Benchmarks

Track key metrics:

  • Compilation time: CLI vs SDK
  • Execution latency: single-turn vs multi-turn
  • Token efficiency: context retention benefits
  • Memory usage: session state overhead
  • Cost: per-workflow execution costs

CI/CD Integration

# .github/workflows/test-sdk.yml
name: Test SDK Engine

on: [push, pull_request]

jobs:
  test-sdk:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run SDK unit tests
        run: go test -v ./pkg/workflow/copilot_sdk*
      - name: Run SDK integration tests
        run: make test-sdk-integration
      - name: Run comparison tests
        run: make test-sdk-vs-cli
      - name: Performance benchmarks
        run: make benchmark-sdk

References


Priority: High - Ensures quality
Estimated Effort: 7-10 days
Dependencies: #10159 (SDK engine implementation)
Skills Required: Go testing, GitHub Actions, test automation

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions