Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization and Duplication #11776

@github-actions

Description

@github-actions

Comprehensive semantic analysis of the Go codebase in pkg/ identifying function clustering patterns, duplicate code, outlier functions, and refactoring opportunities to improve code organization and maintainability.

Executive Summary

Analysis of 464 non-test Go files across 18 packages revealed significant refactoring opportunities:

  • 76+ duplicate or near-duplicate code patterns identified across validation files
  • Naming inconsistencies in 21 MCP-related files (3 different naming conventions)
  • 240-file workflow package lacks structural organization despite clear semantic clusters
  • 32 validation files scattered without cohesive grouping
  • High-impact consolidation opportunities in validation helpers, MCP configuration, and compiler integration

Key Finding: The codebase demonstrates good semantic organization at the function level but suffers from structural fragmentation that creates maintenance burden and cognitive overhead.


Analysis Scope

Files Analyzed: 464 Go source files (excluding tests)
Packages Analyzed: 18 top-level packages in pkg/
Focus Areas:

  • Function clustering by semantic purpose
  • Duplicate code detection
  • File organization patterns
  • Naming convention consistency
  • Refactoring opportunities

Primary Analysis Areas:

  1. Validation files (32 files, 27 matching *_validation.go pattern)
  2. MCP configuration files (21 files with 3 different naming patterns)
  3. Compiler integration files (21 files in workflow package)
  4. CLI orchestration files (152 files including compile_* and codemod_* patterns)

1. Package Structure Analysis

Package Distribution by Size

Package Type Count Files Purpose
Utility packages 11 1-2 each Single-responsibility helpers (constants, envutil, gitutil, mathutil, etc.)
Medium packages 4 11-29 Focused modules (campaign, console, parser, stringutil)
Large packages 3 72-772 Complex business logic (cli: 152, parser: 29, workflow: 240)

Critical Complexity: workflow Package

Statistics:

  • 240 source files + 532 test files (772 total)
  • Flat namespace with no subpackages despite multiple distinct domains
  • Identified clusters: Compiler (21 files), Safe Outputs (16 files), MCP (21 files), Engine (13 files), Validation (32 files)

Issue: The workflow package is a "monster package" where semantic clusters exist in naming but not in structure, making navigation difficult and boundaries unclear.


2. Validation Files: 76+ Duplicate Patterns

Overview

Files: 27 files matching *_validation.go pattern + 1 outlier (permissions_validator.go)
Total Functions: 63 validation functions
Duplicate Patterns: 76+ instances across 7 pattern categories

High-Impact Duplicates

Pattern A: Empty String/Nil Checking (40+ instances)

Severity: HIGH
Files Affected: 25+ validation files
Pattern:

if value == "" { return nil }
if config == nil { return nil }

Examples:

  • agent_validation.go:61 - if workflowData.AgentFile == "" { return nil }
  • features_validation.go:37 - if data == nil || data.Features == nil { return nil }
  • sandbox_validation.go:84 - if workflowData.SandboxConfig == nil { return nil }
  • runtime_validation.go:98 - if workflowData.Tools == nil { return nil }
  • Similar patterns in 21+ additional files

Recommendation: Extract to shared helper

// pkg/workflow/validation_helpers.go
func requiresValidation(value string) bool {
    return value != ""
}

func requiresValidationPtr[T any](ptr *T) bool {
    return ptr != nil
}

Impact: Eliminate 40+ lines of repetitive code, establish consistent validation pattern

Pattern B: Map Type Assertion (30+ instances)

Severity: HIGH
Files Affected: 15+ validation files
Pattern:

if config, ok := toolConfig.(map[string]any); ok {
    // Process config
}

if eventMap, ok := eventVal.(map[string]any); !ok {
    return nil
}

Examples:

  • mcp_config_validation.go:86, 127, 132, 200 (4 instances)
  • runtime_validation.go:105
  • dispatch_workflow_validation.go:89, 102, 117, 157, 167, 177 (6 instances)
  • compiler_filters_validation.go:52, 80
  • safe_outputs_domains_validation.go (multiple)
  • agent_validation.go:127

Recommendation: Create assertion helpers

// pkg/workflow/validation_helpers.go
func assertMapOrNil(value any) (map[string]any, bool) {
    mapVal, ok := value.(map[string]any)
    return mapVal, ok
}

func requireMap(value any, fieldName string) (map[string]any, error) {
    mapVal, ok := value.(map[string]any)
    if !ok {
        return nil, fmt.Errorf("%s must be a map, got %T", fieldName, value)
    }
    return mapVal, nil
}

Impact: Reduce 30+ repetitive type assertions, improve error messages

Pattern C: GitHub Expression Regex (4 similar patterns)

Severity: MEDIUM-HIGH
Files Affected: 4 files with overlapping regex patterns

Patterns:

// secrets_validation.go:15
var secretsExpressionPattern = regexp.MustCompile(
    `^\$\{\{\s*secrets\.[A-Za-z_][A-Za-z0-9_]*(\s*\|\|\s*secrets\.[A-Za-z_][A-Za-z0-9_]*)*\s*\}\}$`
)

// expression_validation.go:65, 70
var expressionRegex = regexp.MustCompile(`(?s)\$\{\{(.*?)\}\}`)
var envRegex = regexp.MustCompile(`^env\.[a-zA-Z0-9_-]+$`)

// template_injection_validation.go:62-67
var inlineExpressionRegex = regexp.MustCompile(`\$\{\{[^}]+\}\}`)
var unsafeContextRegex = regexp.MustCompile(
    `\$\{\{\s*(github\.event\.|steps\.[^}]+\.outputs\.|inputs\.)[^}]+\}\}`
)

// safe_outputs_target_validation.go
func isGitHubExpression(s string) bool {
    // Manual check without regex
}

Recommendation: Centralize expression patterns

// pkg/workflow/expression_patterns.go
package workflow

import "regexp"

const (
    GitHubExpressionPattern   = `\$\{\{[^}]+\}\}`
    SecretsExpressionPattern  = `^\$\{\{\s*secrets\.[A-Za-z_][A-Za-z0-9_]*(\s*\|\|\s*secrets\.[A-Za-z_][A-Za-z0-9_]*)*\s*\}\}$`
    UnsafeContextPattern      = `\$\{\{\s*(github\.event\.|steps\.[^}]+\.outputs\.|inputs\.)[^}]+\}\}`
    EnvVarPattern             = `^env\.[a-zA-Z0-9_-]+$`
)

var (
    ExpressionRegex       = regexp.MustCompile(GitHubExpressionPattern)
    SecretsRegex          = regexp.MustCompile(SecretsExpressionPattern)
    UnsafeContextRegex    = regexp.MustCompile(UnsafeContextPattern)
    EnvVarRegex           = regexp.MustCompile(EnvVarPattern)
)

Impact: Single source of truth for expression patterns, easier maintenance

Pattern D: File Path Validation (2 duplicate implementations)

Severity: MEDIUM
Files Affected: agent_validation.go, dispatch_workflow_validation.go

Duplicate 1: File existence checking

// agent_validation.go:84-108
if _, err := os.Stat(fullAgentPath); err != nil {
    if os.IsNotExist(err) {
        return fmt.Errorf("agent file not found: %s", fullAgentPath)
    }
}

// dispatch_workflow_validation.go:195-198
func fileExists(path string) bool {
    _, err := os.Stat(path)
    return err == nil
}

Duplicate 2: Path validation

// dispatch_workflow_validation.go:200-221
func isPathWithinDir(filePath, dirPath string) bool {
    // Complex path validation logic
}

Recommendation: Extract to file validation helpers

// pkg/workflow/file_validation.go
func fileExists(path string) bool {
    _, err := os.Stat(path)
    return err == nil
}

func requireFileExists(path string, description string) error {
    if !fileExists(path) {
        return fmt.Errorf("%s not found: %s", description, path)
    }
    return nil
}

func isPathWithinDir(filePath, dirPath string) bool {
    // Move implementation here
}

Impact: Centralize file validation logic, reduce duplication

Additional Validation Patterns

  • Pattern E: Error collection and batch reporting (5 files)
  • Pattern F: Feature flag checking (6 files)
  • Pattern G: Logging pattern with logger.New() (27 files - acceptable standard)
  • Pattern H: sync.Once caching for schema compilation (3 files - acceptable pattern)

Validation Files Outlier

File: pkg/workflow/permissions_validator.go

Issue: Contains 5 validation functions but doesn't follow the *_validation.go naming convention:

  • ValidatePermissions() (lines 134-180)
  • collectRequiredPermissions() (lines 182-211)
  • checkMissingPermissions() (lines 213-258)
  • FormatValidationMessage() (lines 260-272)
  • formatMissingPermissionsMessage() (lines 274-348)

Recommendation: Rename to permissions_validation.go for consistency


3. MCP Configuration: Naming Inconsistency and Duplication

Naming Inconsistency: 3 Different Conventions

Issue: 21 MCP-related files follow 3 different naming patterns, creating confusion

View Naming Patterns

Pattern 1: Dash-separated (mcp-config-*)

  • mcp-config-builtin.go
  • mcp-config-custom.go
  • mcp-config-playwright.go
  • mcp-config-serena.go
  • mcp-config-types.go
  • mcp-config-utils.go

Pattern 2: Underscore-separated (mcp_*)

  • mcp_renderer.go
  • mcp_environment.go
  • mcp_config_validation.go
  • mcp_github_config.go
  • mcp_gateway_config.go
  • mcp_playwright_config.go
  • mcp_serena_config.go
  • mcp_setup_generator.go

Pattern 3: Suffix style (*_mcp)

  • claude_mcp.go
  • codex_mcp.go
  • copilot_mcp.go

Example Confusion:

  • mcp-config-playwright.go vs mcp_playwright_config.go - both handle Playwright MCP configuration but use different naming

Recommendation: Standardize to mcp_* (underscore) pattern as it's more common in Go and used by newer consolidated files

Function-Level Duplicates in MCP Files

Duplicate A: Wrapper Functions with Consistent Pattern

Pattern: Thin wrappers that only call a *WithOptions() variant

func renderX(yaml, config, isLast) {
    renderXWithOptions(yaml, config, isLast, false, false)
}

Examples:

  • mcp-config-playwright.go:14-17 - renderPlaywrightMCPConfig()renderPlaywrightMCPConfigWithOptions()
  • mcp-config-builtin.go:14-17 - renderSafeOutputsMCPConfig()renderSafeOutputsMCPConfigWithOptions()

Recommendation: Consider removing thin wrappers or create a generic wrapper factory

Duplicate B: Host/Port Resolution for Localhost

Pattern: Identical logic repeated across multiple files

host := "host.docker.internal"
if workflowData != nil && workflowData.SandboxConfig != nil && 
   workflowData.SandboxConfig.Agent != nil && 
   workflowData.SandboxConfig.Agent.Disabled {
    host = "localhost"
}

Locations:

  • mcp-config-builtin.go:29-34 (Safe Outputs)
  • mcp-config-serena.go:115-122 (Serena Local)

Recommendation: Extract helper function

// pkg/workflow/mcp_host_resolver.go
func resolveHostForContainer(workflowData *WorkflowData) string {
    if workflowData != nil && 
       workflowData.SandboxConfig != nil && 
       workflowData.SandboxConfig.Agent != nil && 
       workflowData.SandboxConfig.Agent.Disabled {
        return "localhost"
    }
    return "host.docker.internal"
}
Duplicate C: Custom Args Extraction Pattern

Files: args.go

Near-Duplicate Functions:

  • getGitHubCustomArgs() (lines 39-44)
  • getPlaywrightCustomArgs() (lines 47-52)
  • getSerenaCustomArgs() (lines 55-60)

All three extract args from different config structures but share the same underlying logic calling extractCustomArgs().

Similar Pattern:

  • getGitHubMounts() (88-94) calls extractMounts() - same extraction pattern

Recommendation: These are already partially consolidated, but could benefit from a generic interface-based approach

Duplicate D: TOML Rendering Boilerplate

Pattern: All TOML rendering functions follow nearly identical structure

func renderXMCPConfigTOML(yaml *strings.Builder, ...) {
    yaml.WriteString("          \n")
    yaml.WriteString("          [mcp_servers.X]\n")
    // ... field-specific rendering
}

Affected Functions:

  • renderPlaywrightMCPConfigTOML() (122 lines, mcp-config-playwright.go:43-161)
  • renderSafeOutputsMCPConfigTOML() (118 lines, mcp-config-builtin.go:8-126)
  • renderAgenticWorkflowsMCPConfigTOML() (131 lines, mcp-config-builtin.go:10-141)

Recommendation: Extract TOML header/footer boilerplate

func writeTOMLMCPServerHeader(yaml *strings.Builder, serverName string) {
    yaml.WriteString("          \n")
    yaml.WriteString(fmt.Sprintf("          [mcp_servers.%s]\n", serverName))
}

Parameter Order Inconsistency

Issue: render*WithOptions() functions have inconsistent parameter order

// Different parameter orders:
renderPlaywrightMCPConfigWithOptions(yaml, config, isLast, includeCopilotFields, inlineArgs)
renderSafeOutputsMCPConfigWithOptions(yaml, isLast, includeCopilotFields, workflowData) // isLast in position 2!
renderAgenticWorkflowsMCPConfigWithOptions(yaml, isLast, includeCopilotFields) // No workflowData!
renderSerenaMCPConfigWithOptions(yaml, serenaTool, isLast, includeCopilotFields, inlineArgs)

Recommendation: Standardize parameter order to (yaml, config, isLast, options...)


4. Structural Refactoring Opportunities

Priority 1: High Impact, Lower Risk

Opportunity Current Proposed Impact
Extract Validation Helpers 76+ duplicate patterns pkg/workflow/validation_helpers.go with shared functions -100+ lines, consistent patterns
Standardize MCP Naming 3 naming conventions Consistent mcp_* naming Easier navigation, clearer organization
Consolidate Expression Patterns 4 scattered regex patterns pkg/workflow/expression_patterns.go Single source of truth
Rename Permissions File permissions_validator.go permissions_validation.go Naming consistency

Priority 2: Moderate Impact, Moderate Risk

Opportunity Current Proposed Impact
Extract Validation Subpackage 32 scattered files in flat namespace workflow/validation/ subpackage -6% files in workflow pkg, clearer boundaries
Extract MCP Helpers Host resolution, args extraction duplicated Consolidated helper functions Reduced duplication
Standardize MCP Function Signatures Inconsistent parameter order Unified signature pattern Easier to call uniformly

Priority 3: Long-term Improvements

Opportunity Current Proposed Impact
Break Workflow into Subpackages 240-file flat package Logical subpackages (validation/, mcp/, engines/, compiler/) Major refactor, clearer structure
Create Validation Framework Individual validators Shared interfaces & registry Better extensibility
Extract Engine Implementations Mixed engine logic workflow/engines/{claude,codex,copilot} subpackages Independent testing, clearer encapsulation

5. Implementation Recommendations

Phase 1: Quick Wins (1-2 days)

  1. Create validation_helpers.go with common patterns

    • Extract empty/nil checking helper
    • Extract map type assertion helpers
    • Extract file validation helpers
    • Estimated effort: 4 hours
    • Impact: Eliminate 76+ duplicate lines
  2. Create expression_patterns.go with centralized regex

    • Consolidate 4 expression regex patterns
    • Estimated effort: 2 hours
    • Impact: Single source of truth, easier maintenance
  3. Rename permissions_validator.go to permissions_validation.go

    • Simple file rename for consistency
    • Estimated effort: 30 minutes
    • Impact: Naming convention alignment
  4. Standardize MCP file naming from mcp-config-* to mcp_config_*

    • Rename 6 files for consistency
    • Estimated effort: 1 hour
    • Impact: Uniform naming pattern

Total Phase 1: ~8 hours, high impact on code clarity

Phase 2: Structural Improvements (3-5 days)

  1. Extract validation subpackage

    • Move 32 validation files to workflow/validation/
    • Update imports across codebase
    • Estimated effort: 1-2 days
    • Impact: Clearer package structure
  2. Consolidate MCP helpers

    • Extract host resolution helper
    • Standardize function signatures
    • Create TOML rendering helpers
    • Estimated effort: 1 day
    • Impact: Reduced MCP code duplication
  3. Create validation framework

    • Define validator interfaces
    • Create validator registry
    • Estimated effort: 2 days
    • Impact: Extensible validation system

Total Phase 2: ~5 days, significant architectural improvement

Phase 3: Major Refactoring (2-3 weeks)

  1. Break workflow package into subpackages
    • Create workflow/compiler/, workflow/mcp/, workflow/engines/, workflow/validation/
    • Migrate 240 files to appropriate subpackages
    • Update all imports
    • Estimated effort: 2-3 weeks
    • Impact: Massive improvement in code organization

Note: Phase 3 is a major undertaking requiring careful planning and incremental migration


6. Success Criteria

Phase 1 Success Criteria

  • validation_helpers.go created with 4+ shared helper functions
  • All 76+ duplicate validation patterns replaced with helper calls
  • expression_patterns.go created with 4 centralized regex patterns
  • All expression validation files updated to use centralized patterns
  • permissions_validator.go renamed to permissions_validation.go
  • 6 mcp-config-* files renamed to mcp_config_*
  • All tests pass after changes
  • No functionality broken

Phase 2 Success Criteria

  • workflow/validation/ subpackage created with 32 validation files
  • All validation imports updated
  • MCP helper functions consolidated
  • TOML rendering boilerplate extracted
  • Validation framework interfaces defined
  • Tests pass, no regressions

Phase 3 Success Criteria

  • Workflow package split into 4+ subpackages
  • All 240 source files properly categorized
  • Clear package boundaries documented
  • All imports updated
  • Full test suite passes
  • Build time not significantly impacted

7. Risk Assessment

Low Risk Changes (Phase 1)

  • Risk Level: ⚠️ Low
  • Testing Required: Standard unit tests
  • Rollback: Easy (simple refactoring)

Moderate Risk Changes (Phase 2)

  • Risk Level: ⚠️⚠️ Moderate
  • Testing Required: Full integration tests, verify all workflows compile
  • Rollback: Moderate (package structure changes)

High Risk Changes (Phase 3)

  • Risk Level: ⚠️⚠️⚠️ High
  • Testing Required: Complete regression testing, verify all downstream consumers
  • Rollback: Difficult (major structural change)
  • Recommendation: Incremental migration over multiple releases

8. Related Work

This analysis complements existing refactoring work:


9. Analysis Metadata

Analysis Method: Semantic code clustering using file patterns, function naming analysis, and codebase exploration
Detection Tools: Manual pattern analysis, codebase exploration agents
Analysis Date: 2026-01-25
Total Files Analyzed: 464 Go source files (excluding tests)
Total Packages: 18 top-level packages
Key Focus Areas:

  • Validation files (32 files)
  • MCP configuration files (21 files)
  • Workflow package organization (240 files)

Duplicate Pattern Count: 76+ identified instances
High-Priority Duplicates: 70+ instances in validation alone
Naming Inconsistencies: 21 MCP files using 3 different conventions


10. Recommendations Summary

Immediate Actions (Priority 1)

  1. Create validation_helpers.go to eliminate 76+ duplicate patterns
  2. Create expression_patterns.go to centralize regex patterns
  3. Rename permissions_validator.gopermissions_validation.go
  4. Standardize MCP naming from mcp-config-* to mcp_config_*

Near-Term Actions (Priority 2)

  1. Extract workflow/validation/ subpackage
  2. Consolidate MCP helper functions
  3. Standardize MCP function parameter order
  4. Create TOML rendering helpers

Long-Term Strategic Actions (Priority 3)

  1. Plan incremental migration to subpackage structure
  2. Design validation framework with interfaces
  3. Break workflow package into semantic subpackages
  4. Document clear package boundaries and responsibilities

Objective: Improve code organization, reduce duplication, and enhance maintainability by addressing identified semantic clustering issues and implementing strategic refactoring.

References:

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions