-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Comprehensive semantic analysis of the Go codebase in pkg/ identifying function clustering patterns, duplicate code, outlier functions, and refactoring opportunities to improve code organization and maintainability.
Executive Summary
Analysis of 464 non-test Go files across 18 packages revealed significant refactoring opportunities:
- 76+ duplicate or near-duplicate code patterns identified across validation files
- Naming inconsistencies in 21 MCP-related files (3 different naming conventions)
- 240-file workflow package lacks structural organization despite clear semantic clusters
- 32 validation files scattered without cohesive grouping
- High-impact consolidation opportunities in validation helpers, MCP configuration, and compiler integration
Key Finding: The codebase demonstrates good semantic organization at the function level but suffers from structural fragmentation that creates maintenance burden and cognitive overhead.
Analysis Scope
Files Analyzed: 464 Go source files (excluding tests)
Packages Analyzed: 18 top-level packages in pkg/
Focus Areas:
- Function clustering by semantic purpose
- Duplicate code detection
- File organization patterns
- Naming convention consistency
- Refactoring opportunities
Primary Analysis Areas:
- Validation files (32 files, 27 matching
*_validation.gopattern) - MCP configuration files (21 files with 3 different naming patterns)
- Compiler integration files (21 files in workflow package)
- CLI orchestration files (152 files including compile_* and codemod_* patterns)
1. Package Structure Analysis
Package Distribution by Size
| Package Type | Count | Files | Purpose |
|---|---|---|---|
| Utility packages | 11 | 1-2 each | Single-responsibility helpers (constants, envutil, gitutil, mathutil, etc.) |
| Medium packages | 4 | 11-29 | Focused modules (campaign, console, parser, stringutil) |
| Large packages | 3 | 72-772 | Complex business logic (cli: 152, parser: 29, workflow: 240) |
Critical Complexity: workflow Package
Statistics:
- 240 source files + 532 test files (772 total)
- Flat namespace with no subpackages despite multiple distinct domains
- Identified clusters: Compiler (21 files), Safe Outputs (16 files), MCP (21 files), Engine (13 files), Validation (32 files)
Issue: The workflow package is a "monster package" where semantic clusters exist in naming but not in structure, making navigation difficult and boundaries unclear.
2. Validation Files: 76+ Duplicate Patterns
Overview
Files: 27 files matching *_validation.go pattern + 1 outlier (permissions_validator.go)
Total Functions: 63 validation functions
Duplicate Patterns: 76+ instances across 7 pattern categories
High-Impact Duplicates
Pattern A: Empty String/Nil Checking (40+ instances)
Severity: HIGH
Files Affected: 25+ validation files
Pattern:
if value == "" { return nil }
if config == nil { return nil }Examples:
agent_validation.go:61-if workflowData.AgentFile == "" { return nil }features_validation.go:37-if data == nil || data.Features == nil { return nil }sandbox_validation.go:84-if workflowData.SandboxConfig == nil { return nil }runtime_validation.go:98-if workflowData.Tools == nil { return nil }- Similar patterns in 21+ additional files
Recommendation: Extract to shared helper
// pkg/workflow/validation_helpers.go
func requiresValidation(value string) bool {
return value != ""
}
func requiresValidationPtr[T any](ptr *T) bool {
return ptr != nil
}Impact: Eliminate 40+ lines of repetitive code, establish consistent validation pattern
Pattern B: Map Type Assertion (30+ instances)
Severity: HIGH
Files Affected: 15+ validation files
Pattern:
if config, ok := toolConfig.(map[string]any); ok {
// Process config
}
if eventMap, ok := eventVal.(map[string]any); !ok {
return nil
}Examples:
mcp_config_validation.go:86, 127, 132, 200(4 instances)runtime_validation.go:105dispatch_workflow_validation.go:89, 102, 117, 157, 167, 177(6 instances)compiler_filters_validation.go:52, 80safe_outputs_domains_validation.go(multiple)agent_validation.go:127
Recommendation: Create assertion helpers
// pkg/workflow/validation_helpers.go
func assertMapOrNil(value any) (map[string]any, bool) {
mapVal, ok := value.(map[string]any)
return mapVal, ok
}
func requireMap(value any, fieldName string) (map[string]any, error) {
mapVal, ok := value.(map[string]any)
if !ok {
return nil, fmt.Errorf("%s must be a map, got %T", fieldName, value)
}
return mapVal, nil
}Impact: Reduce 30+ repetitive type assertions, improve error messages
Pattern C: GitHub Expression Regex (4 similar patterns)
Severity: MEDIUM-HIGH
Files Affected: 4 files with overlapping regex patterns
Patterns:
// secrets_validation.go:15
var secretsExpressionPattern = regexp.MustCompile(
`^\$\{\{\s*secrets\.[A-Za-z_][A-Za-z0-9_]*(\s*\|\|\s*secrets\.[A-Za-z_][A-Za-z0-9_]*)*\s*\}\}$`
)
// expression_validation.go:65, 70
var expressionRegex = regexp.MustCompile(`(?s)\$\{\{(.*?)\}\}`)
var envRegex = regexp.MustCompile(`^env\.[a-zA-Z0-9_-]+$`)
// template_injection_validation.go:62-67
var inlineExpressionRegex = regexp.MustCompile(`\$\{\{[^}]+\}\}`)
var unsafeContextRegex = regexp.MustCompile(
`\$\{\{\s*(github\.event\.|steps\.[^}]+\.outputs\.|inputs\.)[^}]+\}\}`
)
// safe_outputs_target_validation.go
func isGitHubExpression(s string) bool {
// Manual check without regex
}Recommendation: Centralize expression patterns
// pkg/workflow/expression_patterns.go
package workflow
import "regexp"
const (
GitHubExpressionPattern = `\$\{\{[^}]+\}\}`
SecretsExpressionPattern = `^\$\{\{\s*secrets\.[A-Za-z_][A-Za-z0-9_]*(\s*\|\|\s*secrets\.[A-Za-z_][A-Za-z0-9_]*)*\s*\}\}$`
UnsafeContextPattern = `\$\{\{\s*(github\.event\.|steps\.[^}]+\.outputs\.|inputs\.)[^}]+\}\}`
EnvVarPattern = `^env\.[a-zA-Z0-9_-]+$`
)
var (
ExpressionRegex = regexp.MustCompile(GitHubExpressionPattern)
SecretsRegex = regexp.MustCompile(SecretsExpressionPattern)
UnsafeContextRegex = regexp.MustCompile(UnsafeContextPattern)
EnvVarRegex = regexp.MustCompile(EnvVarPattern)
)Impact: Single source of truth for expression patterns, easier maintenance
Pattern D: File Path Validation (2 duplicate implementations)
Severity: MEDIUM
Files Affected: agent_validation.go, dispatch_workflow_validation.go
Duplicate 1: File existence checking
// agent_validation.go:84-108
if _, err := os.Stat(fullAgentPath); err != nil {
if os.IsNotExist(err) {
return fmt.Errorf("agent file not found: %s", fullAgentPath)
}
}
// dispatch_workflow_validation.go:195-198
func fileExists(path string) bool {
_, err := os.Stat(path)
return err == nil
}Duplicate 2: Path validation
// dispatch_workflow_validation.go:200-221
func isPathWithinDir(filePath, dirPath string) bool {
// Complex path validation logic
}Recommendation: Extract to file validation helpers
// pkg/workflow/file_validation.go
func fileExists(path string) bool {
_, err := os.Stat(path)
return err == nil
}
func requireFileExists(path string, description string) error {
if !fileExists(path) {
return fmt.Errorf("%s not found: %s", description, path)
}
return nil
}
func isPathWithinDir(filePath, dirPath string) bool {
// Move implementation here
}Impact: Centralize file validation logic, reduce duplication
Additional Validation Patterns
- Pattern E: Error collection and batch reporting (5 files)
- Pattern F: Feature flag checking (6 files)
- Pattern G: Logging pattern with
logger.New()(27 files - acceptable standard) - Pattern H:
sync.Oncecaching for schema compilation (3 files - acceptable pattern)
Validation Files Outlier
File: pkg/workflow/permissions_validator.go
Issue: Contains 5 validation functions but doesn't follow the *_validation.go naming convention:
ValidatePermissions()(lines 134-180)collectRequiredPermissions()(lines 182-211)checkMissingPermissions()(lines 213-258)FormatValidationMessage()(lines 260-272)formatMissingPermissionsMessage()(lines 274-348)
Recommendation: Rename to permissions_validation.go for consistency
3. MCP Configuration: Naming Inconsistency and Duplication
Naming Inconsistency: 3 Different Conventions
Issue: 21 MCP-related files follow 3 different naming patterns, creating confusion
View Naming Patterns
Pattern 1: Dash-separated (mcp-config-*)
mcp-config-builtin.gomcp-config-custom.gomcp-config-playwright.gomcp-config-serena.gomcp-config-types.gomcp-config-utils.go
Pattern 2: Underscore-separated (mcp_*)
mcp_renderer.gomcp_environment.gomcp_config_validation.gomcp_github_config.gomcp_gateway_config.gomcp_playwright_config.gomcp_serena_config.gomcp_setup_generator.go
Pattern 3: Suffix style (*_mcp)
claude_mcp.gocodex_mcp.gocopilot_mcp.go
Example Confusion:
mcp-config-playwright.govsmcp_playwright_config.go- both handle Playwright MCP configuration but use different naming
Recommendation: Standardize to mcp_* (underscore) pattern as it's more common in Go and used by newer consolidated files
Function-Level Duplicates in MCP Files
Duplicate A: Wrapper Functions with Consistent Pattern
Pattern: Thin wrappers that only call a *WithOptions() variant
func renderX(yaml, config, isLast) {
renderXWithOptions(yaml, config, isLast, false, false)
}Examples:
mcp-config-playwright.go:14-17-renderPlaywrightMCPConfig()→renderPlaywrightMCPConfigWithOptions()mcp-config-builtin.go:14-17-renderSafeOutputsMCPConfig()→renderSafeOutputsMCPConfigWithOptions()
Recommendation: Consider removing thin wrappers or create a generic wrapper factory
Duplicate B: Host/Port Resolution for Localhost
Pattern: Identical logic repeated across multiple files
host := "host.docker.internal"
if workflowData != nil && workflowData.SandboxConfig != nil &&
workflowData.SandboxConfig.Agent != nil &&
workflowData.SandboxConfig.Agent.Disabled {
host = "localhost"
}Locations:
mcp-config-builtin.go:29-34(Safe Outputs)mcp-config-serena.go:115-122(Serena Local)
Recommendation: Extract helper function
// pkg/workflow/mcp_host_resolver.go
func resolveHostForContainer(workflowData *WorkflowData) string {
if workflowData != nil &&
workflowData.SandboxConfig != nil &&
workflowData.SandboxConfig.Agent != nil &&
workflowData.SandboxConfig.Agent.Disabled {
return "localhost"
}
return "host.docker.internal"
}Duplicate C: Custom Args Extraction Pattern
Files: args.go
Near-Duplicate Functions:
getGitHubCustomArgs()(lines 39-44)getPlaywrightCustomArgs()(lines 47-52)getSerenaCustomArgs()(lines 55-60)
All three extract args from different config structures but share the same underlying logic calling extractCustomArgs().
Similar Pattern:
getGitHubMounts()(88-94) callsextractMounts()- same extraction pattern
Recommendation: These are already partially consolidated, but could benefit from a generic interface-based approach
Duplicate D: TOML Rendering Boilerplate
Pattern: All TOML rendering functions follow nearly identical structure
func renderXMCPConfigTOML(yaml *strings.Builder, ...) {
yaml.WriteString(" \n")
yaml.WriteString(" [mcp_servers.X]\n")
// ... field-specific rendering
}Affected Functions:
renderPlaywrightMCPConfigTOML()(122 lines, mcp-config-playwright.go:43-161)renderSafeOutputsMCPConfigTOML()(118 lines, mcp-config-builtin.go:8-126)renderAgenticWorkflowsMCPConfigTOML()(131 lines, mcp-config-builtin.go:10-141)
Recommendation: Extract TOML header/footer boilerplate
func writeTOMLMCPServerHeader(yaml *strings.Builder, serverName string) {
yaml.WriteString(" \n")
yaml.WriteString(fmt.Sprintf(" [mcp_servers.%s]\n", serverName))
}Parameter Order Inconsistency
Issue: render*WithOptions() functions have inconsistent parameter order
// Different parameter orders:
renderPlaywrightMCPConfigWithOptions(yaml, config, isLast, includeCopilotFields, inlineArgs)
renderSafeOutputsMCPConfigWithOptions(yaml, isLast, includeCopilotFields, workflowData) // isLast in position 2!
renderAgenticWorkflowsMCPConfigWithOptions(yaml, isLast, includeCopilotFields) // No workflowData!
renderSerenaMCPConfigWithOptions(yaml, serenaTool, isLast, includeCopilotFields, inlineArgs)Recommendation: Standardize parameter order to (yaml, config, isLast, options...)
4. Structural Refactoring Opportunities
Priority 1: High Impact, Lower Risk
| Opportunity | Current | Proposed | Impact |
|---|---|---|---|
| Extract Validation Helpers | 76+ duplicate patterns | pkg/workflow/validation_helpers.go with shared functions |
-100+ lines, consistent patterns |
| Standardize MCP Naming | 3 naming conventions | Consistent mcp_* naming |
Easier navigation, clearer organization |
| Consolidate Expression Patterns | 4 scattered regex patterns | pkg/workflow/expression_patterns.go |
Single source of truth |
| Rename Permissions File | permissions_validator.go |
permissions_validation.go |
Naming consistency |
Priority 2: Moderate Impact, Moderate Risk
| Opportunity | Current | Proposed | Impact |
|---|---|---|---|
| Extract Validation Subpackage | 32 scattered files in flat namespace | workflow/validation/ subpackage |
-6% files in workflow pkg, clearer boundaries |
| Extract MCP Helpers | Host resolution, args extraction duplicated | Consolidated helper functions | Reduced duplication |
| Standardize MCP Function Signatures | Inconsistent parameter order | Unified signature pattern | Easier to call uniformly |
Priority 3: Long-term Improvements
| Opportunity | Current | Proposed | Impact |
|---|---|---|---|
| Break Workflow into Subpackages | 240-file flat package | Logical subpackages (validation/, mcp/, engines/, compiler/) | Major refactor, clearer structure |
| Create Validation Framework | Individual validators | Shared interfaces & registry | Better extensibility |
| Extract Engine Implementations | Mixed engine logic | workflow/engines/{claude,codex,copilot} subpackages |
Independent testing, clearer encapsulation |
5. Implementation Recommendations
Phase 1: Quick Wins (1-2 days)
-
Create
validation_helpers.gowith common patterns- Extract empty/nil checking helper
- Extract map type assertion helpers
- Extract file validation helpers
- Estimated effort: 4 hours
- Impact: Eliminate 76+ duplicate lines
-
Create
expression_patterns.gowith centralized regex- Consolidate 4 expression regex patterns
- Estimated effort: 2 hours
- Impact: Single source of truth, easier maintenance
-
Rename
permissions_validator.gotopermissions_validation.go- Simple file rename for consistency
- Estimated effort: 30 minutes
- Impact: Naming convention alignment
-
Standardize MCP file naming from
mcp-config-*tomcp_config_*- Rename 6 files for consistency
- Estimated effort: 1 hour
- Impact: Uniform naming pattern
Total Phase 1: ~8 hours, high impact on code clarity
Phase 2: Structural Improvements (3-5 days)
-
Extract validation subpackage
- Move 32 validation files to
workflow/validation/ - Update imports across codebase
- Estimated effort: 1-2 days
- Impact: Clearer package structure
- Move 32 validation files to
-
Consolidate MCP helpers
- Extract host resolution helper
- Standardize function signatures
- Create TOML rendering helpers
- Estimated effort: 1 day
- Impact: Reduced MCP code duplication
-
Create validation framework
- Define validator interfaces
- Create validator registry
- Estimated effort: 2 days
- Impact: Extensible validation system
Total Phase 2: ~5 days, significant architectural improvement
Phase 3: Major Refactoring (2-3 weeks)
- Break workflow package into subpackages
- Create
workflow/compiler/,workflow/mcp/,workflow/engines/,workflow/validation/ - Migrate 240 files to appropriate subpackages
- Update all imports
- Estimated effort: 2-3 weeks
- Impact: Massive improvement in code organization
- Create
Note: Phase 3 is a major undertaking requiring careful planning and incremental migration
6. Success Criteria
Phase 1 Success Criteria
-
validation_helpers.gocreated with 4+ shared helper functions - All 76+ duplicate validation patterns replaced with helper calls
-
expression_patterns.gocreated with 4 centralized regex patterns - All expression validation files updated to use centralized patterns
-
permissions_validator.gorenamed topermissions_validation.go - 6
mcp-config-*files renamed tomcp_config_* - All tests pass after changes
- No functionality broken
Phase 2 Success Criteria
-
workflow/validation/subpackage created with 32 validation files - All validation imports updated
- MCP helper functions consolidated
- TOML rendering boilerplate extracted
- Validation framework interfaces defined
- Tests pass, no regressions
Phase 3 Success Criteria
- Workflow package split into 4+ subpackages
- All 240 source files properly categorized
- Clear package boundaries documented
- All imports updated
- Full test suite passes
- Build time not significantly impacted
7. Risk Assessment
Low Risk Changes (Phase 1)
- Risk Level:
⚠️ Low - Testing Required: Standard unit tests
- Rollback: Easy (simple refactoring)
Moderate Risk Changes (Phase 2)
- Risk Level:
⚠️ ⚠️ Moderate - Testing Required: Full integration tests, verify all workflows compile
- Rollback: Moderate (package structure changes)
High Risk Changes (Phase 3)
- Risk Level:
⚠️ ⚠️ ⚠️ High - Testing Required: Complete regression testing, verify all downstream consumers
- Rollback: Difficult (major structural change)
- Recommendation: Incremental migration over multiple releases
8. Related Work
This analysis complements existing refactoring work:
- Issue [Code Quality] Refactor compiler_safe_outputs_config.go to Eliminate 400 Lines of Repetition #11768: [Code Quality] Refactor compiler_safe_outputs_config.go to Eliminate 400 Lines of Repetition
- Focuses on compiler safe outputs configuration duplication
- This analysis identifies broader patterns across the entire codebase
9. Analysis Metadata
Analysis Method: Semantic code clustering using file patterns, function naming analysis, and codebase exploration
Detection Tools: Manual pattern analysis, codebase exploration agents
Analysis Date: 2026-01-25
Total Files Analyzed: 464 Go source files (excluding tests)
Total Packages: 18 top-level packages
Key Focus Areas:
- Validation files (32 files)
- MCP configuration files (21 files)
- Workflow package organization (240 files)
Duplicate Pattern Count: 76+ identified instances
High-Priority Duplicates: 70+ instances in validation alone
Naming Inconsistencies: 21 MCP files using 3 different conventions
10. Recommendations Summary
Immediate Actions (Priority 1)
- Create
validation_helpers.goto eliminate 76+ duplicate patterns - Create
expression_patterns.goto centralize regex patterns - Rename
permissions_validator.go→permissions_validation.go - Standardize MCP naming from
mcp-config-*tomcp_config_*
Near-Term Actions (Priority 2)
- Extract
workflow/validation/subpackage - Consolidate MCP helper functions
- Standardize MCP function parameter order
- Create TOML rendering helpers
Long-Term Strategic Actions (Priority 3)
- Plan incremental migration to subpackage structure
- Design validation framework with interfaces
- Break workflow package into semantic subpackages
- Document clear package boundaries and responsibilities
Objective: Improve code organization, reduce duplication, and enhance maintainability by addressing identified semantic clustering issues and implementing strategic refactoring.
References:
AI generated by Semantic Function Refactoring