Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Opportunities #12291

@github-actions

Description

@github-actions

Semantic analysis of Go codebase to identify refactoring opportunities through function clustering, outlier detection, and code organization patterns

Executive Summary

Analyzed 483 Go source files across the pkg/ directory (247 in workflow, 160 in CLI, 76 in supporting packages). The codebase demonstrates strong semantic organization with consistent naming conventions, but several high-impact refactoring opportunities exist:

  • Large monolithic files: 10+ files exceed 750 lines with mixed responsibilities
  • Helper file accumulation: 13 helper files with varying levels of cohesion
  • Factory method concentration: Files like permissions.go (928 lines, 36+ factory methods)
  • Validation pattern duplication: 76+ similar validation patterns across 32 files
  • Configuration parsing patterns: Multiple parse*Config functions with similar logic

Impact: Improved maintainability, testability, and reduced cognitive overhead for developers.


Function Inventory by Package

Primary Packages Analysis

Package Files Lines (Est.) Primary Purpose Organization Quality
workflow 247 117K+ Core compilation/execution engine ⚠️ Some large files need splitting
cli 160 38K+ Command-line interface ⚠️ 4 files exceed 1000 lines
parser 30 Unknown Parsing utilities ✅ Well organized
campaign 14 Unknown Campaign management ✅ Focused scope
console 11 Unknown UI/Output rendering ✅ Clean separation

Utility Packages (Well Organized ✅)

Small, focused packages with clear responsibilities:

  • stringutil (5 files), logger (4 files), types (2 files)
  • Single-file utilities: tty, timeutil, testutil, styles, sliceutil, repoutil, mathutil, gitutil, envutil, constants

Clustering Results

The codebase follows semantic underscore-separated naming with clear verb-prefix patterns:

File Naming Patterns (Semantic Grouping)

Pattern Count Purpose Organization
compiler_* 26 Compilation orchestration ✅ Clear grouping
safe_outputs* 20 Safety boundary outputs ✅ Well organized
mcp_* 16 Model Context Protocol ✅ Cohesive cluster
create_* 8 GitHub entity creation ✅ One per entity type
update_* 7 GitHub entity updates ✅ Parallel to create_*
*_validation.go 31 Validation logic ✅ Separated from main logic
*_helpers.go 13 Helper functions ⚠️ Varying cohesion
*_config*.go Multiple Configuration parsing ⚠️ Some overlap

Function Naming Patterns

Workflow Package - Top Patterns:

59 parse*        (parseWorkflow, parseExpression, parseIssuesConfig)
56 get*          (getPermissions, getRequiredSecrets, getAllowedDomains)
50 generate*     (generateWorkflowHeader, generateSafeOutputsConfig)
37 extract*      (extractExpressions, extractSecrets, extractMetadata)
34 validate*     (validateExpression, validateSandbox, validateRuntimeMode)
32 is*           (isAllowed, isEnabled, isValid)
24 build*        (buildJob, buildStep, buildCopilotAssignmentStep)
14 render*       (renderYAML, renderSharedMCPConfig, renderGitHubMCP)
```

**CLI Package - Top Patterns:**
```
64 get*          (getConfig, getWorkflows, getStatus)
29 parse*        (parseFlags, parseWorkflow)
27 extract*      (extractLogs, extractMetrics)
25 ensure*       (ensureDirectory, ensureConfig)
20 run*          (runWorkflow, runCommand)
19 create*       (createCommand, createPR, createIssue)

Observation: Consistent camelCase for unexported functions, PascalCase for exported (idiomatic Go). No mixed patterns.


Identified Issues

1. Large Monolithic Files (Refactoring Priority: High)

Files with multiple semantic concerns that should be split:

Top 10 Largest Files
Rank File Lines Package Functions Issues
1 add_command.go 1,218 cli 18 Combines workflow resolution, UI, addition, compilation, PR creation, file tracking
2 add_interactive.go 1,025 cli 23 Interactive UI + workflow operations + file operations mixed
3 trial_command.go 1,000 cli Unknown Trial execution command
4 mcp_server.go 1,000 cli Unknown MCP server management
5 safe_outputs_config_generation.go 978 workflow 5 Configuration generation from workflow data
6 permissions.go 928 workflow 36 36 factory methods + parsing + operators
7 init.go 868 cli Unknown Initialization logic
8 audit.go 864 cli Unknown Audit trail generation
9 logs_orchestrator.go 855 cli 5 Downloading + normalization + filtering + parsing
10 mcp_renderer.go 844 workflow 4 Renders GitHub/Playwright/custom MCPs

Detailed Examples

Example 1: pkg/cli/add_command.go (1,218 lines)

Issue: Single file handles 6 distinct responsibilities

Functional areas:

  1. Command definition & flag configuration (203 lines)
  2. Workflow resolution & parsing (130 lines)
  3. Repository listing & interactive selection (150 lines)
  4. Normal workflow addition (101 lines)
  5. PR creation flow (105 lines)
  6. Single workflow addition with tracking (262 lines - complexity hotspot)
  7. Compilation helpers (190 lines)

Recommendation: Split into 6 focused modules:

  • add_command.go - Command definition (~250 lines)
  • add_workflow_resolution.go - Workflow resolution (~200 lines)
  • add_workflow_repository.go - Repository operations (~150 lines)
  • add_workflow_operations.go - Core addition logic (~400 lines)
  • add_workflow_pr.go - PR flow (~110 lines)
  • add_workflow_compilation.go - Compilation helpers (~150 lines)

Impact: Easier testing, clearer responsibilities, reduced cognitive load

Note: Issue #12263 already tracks this refactoring with a detailed plan.


Example 2: pkg/workflow/permissions.go (928 lines, 36+ functions)

Issue: Factory method accumulation - 36+ New*Permission() functions

Functions:

  • 5 parsing methods
  • 36+ factory methods (NewContentsReadPermission, NewIssuesWritePermission, etc.)
  • 8+ operator methods (merge, render, validate)

Recommendation: Split into 3 semantic modules:

  • permissions_parser.go - Parsing logic (5 methods)
  • permissions_factory.go - Factory methods (36+ New* functions)
  • permissions_operations.go - Operators (merge, render, validate)

Estimated Impact: Improved discoverability, easier to locate specific permission types


Example 3: pkg/workflow/mcp_renderer.go (844 lines)

Issue: Multi-target renderer in single file

Renders:

  • GitHub MCP config
  • Playwright MCP config
  • Custom tool MCP config
  • Shared MCP infrastructure

Recommendation: Split by target:

  • mcp_renderer.go - Base renderer & shared logic
  • mcp_renderer_github.go - GitHub-specific rendering
  • mcp_renderer_playwright.go - Playwright-specific rendering
  • mcp_renderer_custom.go - Custom tool rendering

Estimated Impact: Parallel development, clearer boundaries, easier testing


2. Helper File Organization (Refactoring Priority: Medium)

13 helper files identified with varying levels of cohesion:

Helper Files Analysis

Well-Organized Helper Files ✅

File Functions Cohesion Assessment
validation_helpers.go 7 High Generic validation patterns
config_helpers.go 10 High Configuration parsing utilities
error_helpers.go 5 High Error construction helpers
git_helpers.go 1 High Git tag extraction
map_helpers.go 2 High Map manipulation utilities

Candidates for Reorganization ⚠️

File Functions Issue Recommendation
engine_helpers.go 10 Mixed: installation + env filtering + path setup Split: engine_installation.go, engine_environment.go
compiler_yaml_helpers.go 4+ Mixed: path extraction + code generation + placeholders Consider moving to main compiler file
safe_outputs_config_generation_helpers.go 10 All related to config generation ✅ Could stay together or merge with parent

Pattern: Helper files work well when they contain generic, reusable utilities (validation, error handling, parsing). They become problematic when they accumulate domain-specific logic that belongs in main modules.


3. Validation Pattern Duplication (Refactoring Priority: Medium)

76+ similar validation patterns across 32 validation files

Common Patterns

Pattern Occurrences (Est.) Files Example Functions
Integer range validation 15+ 12 files validateIntRange(value, min, max, fieldName)
Required field validation 20+ 18 files String empty checks
List membership validation 12+ 9 files value in allowedValues
String length validation 10+ 8 files len(str) < minLength
File existence checks 8+ 6 files os.Stat(path) patterns

Existing Solution (Partial) ✅

validation_helpers.go already consolidates some patterns:

  • validateIntRange() - Integer range validation
  • ValidateRequired() - Required field validation
  • ValidateMaxLength() / ValidateMinLength() - Length validation
  • ValidateInList() - List membership
  • ValidatePositiveInt() / ValidateNonNegativeInt() - Integer validation
  • fileExists() - File existence check

Note: The file includes TODO comments indicating Phase 2 refactoring to add:

  • isEmptyOrNil() - Empty/nil checks
  • getMapFieldAsString() / getMapFieldAsMap() / getMapFieldAsBool() - Map field extraction
  • dirExists() - Directory existence check

Recommendation: Continue consolidation effort. Migrate remaining validation patterns to use these helpers across the 32 validation files.

Estimated Impact: Reduced duplication, consistent validation behavior, easier testing


4. Configuration Parsing Patterns (Refactoring Priority: Low-Medium)

Similar parse*Config functions with overlapping logic

Identified Patterns

Function Pattern Occurrences Files Example
parse*Config(configMap map[string]any) 14+ 8 files parseLabelsFromConfig, parseMessagesConfig, parseAppConfig
parse*FromConfig(configMap, key) 10+ 5 files parseExpiresFromConfig, parseParticipantsFromConfig
Array parsing 8+ 4 files String array extraction with type checking
String extraction 12+ 6 files String value extraction with validation

Existing Solution (Partial) ✅

config_helpers.go provides generic parsing helpers:

  • ParseStringArrayFromConfig(m, key, log) - Generic string array extraction
  • extractStringFromMap(m, key, log) - Generic string extraction
  • ParseIntFromConfig(m, key, log) - Generic integer extraction
  • ParseBoolFromConfig(m, key, log) - Generic boolean extraction
  • unmarshalConfig(m, key, target, log) - Type-safe YAML unmarshaling

Observation: Many specific parse functions are thin wrappers around these generic helpers:

// Thin wrapper pattern (appropriate)
func parseLabelsFromConfig(configMap map[string]any) []string {
    return ParseStringArrayFromConfig(configMap, "labels", configHelpersLog)
}

func parseTitlePrefixFromConfig(configMap map[string]any) string {
    return extractStringFromMap(configMap, "title-prefix", configHelpersLog)
}

Assessment: Current organization is reasonable. Thin wrappers provide semantic clarity while avoiding duplication. No major refactoring needed.


5. Error Message Patterns (Refactoring Priority: Low)

Consistent error wrapping patterns across files

Error Pattern Occurrences Files
fmt.Errorf("failed to %s: %w", ...) 114 37 files
fmt.Errorf("could not %s: %w", ...) 0 -
fmt.Errorf("unable to %s: %w", ...) 1 1 file

Observation: Codebase consistently uses "failed to" pattern (114 occurrences). This is good consistency.

Existing Solution ✅

error_helpers.go provides structured error constructors:

  • NewValidationError(field, value, reason, suggestion)
  • NewOperationError(operation, entityType, entityID, cause, suggestion)
  • NewConfigurationError(configKey, value, reason, suggestion)
  • EnhanceError(err, context, suggestion)
  • WrapErrorWithContext(err, context, suggestion)

Assessment: Error handling is well-structured with helper utilities. No major refactoring needed.


Detailed Function Clusters

Cluster 1: Creation Functions (create_* pattern)

Pattern: GitHub entity creation handlers - one file per entity type

File Functions Organization
create_issue.go CreateIssue(...) ✅ Focused
create_pull_request.go CreatePR(...) ✅ Focused
create_discussion.go CreateDiscussion(...) ✅ Focused
create_project.go CreateProject(...) ✅ Focused
create_project_status_update.go CreateProjectStatusUpdate(...) ✅ Focused

Analysis: ✅ Well-organized - Each creation function has its own file with clear responsibility. This pattern should be maintained.


Cluster 2: Update Functions (update_* pattern)

Pattern: GitHub entity modification handlers - parallel to create_* pattern

File Functions Organization
update_issue.go UpdateIssue(...) ✅ Focused
update_pull_request.go UpdatePR(...) ✅ Focused
update_discussion.go UpdateDiscussion(...) ✅ Focused
update_project.go UpdateProject(...) ✅ Focused

Analysis: ✅ Well-organized - Mirrors create_* pattern consistently.


Cluster 3: Validation Functions (validate* pattern)

Pattern: 31 validation files with *_validation.go suffix

Organization: ✅ Validation logic consistently separated from main logic

Examples:

  • expression_validation.go - Expression safety validation
  • sandbox_validation.go - Sandbox configuration validation
  • template_validation.go - Template injection validation
  • runtime_validation.go - Runtime mode validation
  • permissions_validation.go - Permission validation
  • features_validation.go - Feature flag validation

Analysis: ✅ Excellent pattern - Validation logic is isolated for easier testing and maintenance. Continue this pattern.


Cluster 4: Compilation Functions (compiler_* pattern)

Pattern: 26 compiler-related files

Organization: ⚠️ Some overlap between files

Examples:

  • compiler.go - Main orchestration
  • compiler_yaml.go - YAML generation
  • compiler_jobs.go - Job compilation
  • compiler_safe_outputs.go - Safe outputs compilation
  • compiler_activation_jobs.go - Activation job generation
  • compiler_orchestrator_*.go - Multiple orchestrator files

Observation: 26 compiler files suggest deep hierarchy. Some boundaries may be unclear.

Recommendation: Document the compilation pipeline architecture to clarify file responsibilities. Consider whether some compiler_orchestrator_* files could be consolidated.


Cluster 5: MCP Configuration (mcp_* pattern)

Pattern: 16 MCP-related files

File Purpose Organization
mcp_config.go Main configuration
mcp_renderer.go Rendering (844 lines) ⚠️ Multi-target
mcp_github_config.go GitHub-specific config
mcp_playwright_config.go Playwright-specific config
mcp_config_custom.go Custom tool config
mcp_config_validation.go Validation
Others Setup, environment, gateway

Analysis: Generally well-organized. Main issue is mcp_renderer.go (844 lines) rendering multiple targets - candidate for splitting (see Issue #1 above).


Refactoring Recommendations

Priority 1: High Impact (Immediate Action)

1.1 Split Large Monolithic Files (Effort: Large, Impact: High)

Target files: add_command.go (1,218 lines), add_interactive.go (1,025 lines), trial_command.go (1,000 lines), mcp_server.go (1,000 lines)

Approach:

  1. Identify semantic boundaries (already documented for add_command.go in #12263)
  2. Split into focused modules (6-8 files per large file)
  3. Maintain public API exports
  4. Add tests for each new module

Benefits:

  • Easier testing (smaller test files focused on specific functionality)
  • Reduced cognitive load (developers work on smaller, focused files)
  • Parallel development (multiple developers can work on different modules)
  • Clearer responsibilities (each file has single, clear purpose)

Estimated Effort: 2-3 days per file (including tests)

1.2 Split Factory Method Files (Effort: Medium, Impact: High)

Target file: permissions.go (928 lines, 36+ factory methods)

Approach:

  1. Split into 3 files:
    • permissions_parser.go - Parsing logic (5 methods)
    • permissions_factory.go - Factory methods (36+ New* functions)
    • permissions_operations.go - Operators (merge, render, validate)
  2. Keep main permissions.go for types and core logic

Benefits:

  • Improved discoverability (easier to find specific permission types)
  • Faster file navigation
  • Clearer file purposes

Estimated Effort: 4-6 hours


Priority 2: Medium Impact (Planned Work)

2.1 Continue Validation Helper Consolidation (Effort: Medium, Impact: Medium)

Current state: validation_helpers.go exists with 7+ helper functions. TODO comments indicate Phase 2 work.

Approach:

  1. Implement Phase 2 helpers:
    • isEmptyOrNil() - Empty/nil/zero checks
    • getMapFieldAsString() / getMapFieldAsMap() / getMapFieldAsBool() / getMapFieldAsInt() - Safe map field extraction
    • dirExists() - Directory existence check
  2. Migrate existing validation patterns to use these helpers across 32 validation files
  3. Update tests to cover new helpers

Benefits:

  • Reduced duplication (76+ patterns → centralized helpers)
  • Consistent validation behavior
  • Easier testing (test helpers once, use everywhere)

Estimated Effort: 1-2 days

2.2 Split Multi-Target Renderers (Effort: Medium, Impact: Medium)

Target file: mcp_renderer.go (844 lines)

Approach:

  1. Split by target:
    • mcp_renderer.go - Base renderer & shared logic
    • mcp_renderer_github.go - GitHub-specific rendering
    • mcp_renderer_playwright.go - Playwright-specific rendering
    • mcp_renderer_custom.go - Custom tool rendering
  2. Use composition/interfaces for shared functionality

Benefits:

  • Parallel development (different developers can work on different renderers)
  • Clearer testing boundaries
  • Easier to add new MCP targets

Estimated Effort: 6-8 hours


Priority 3: Long-term Improvements (Future Work)

3.1 Document Compiler Architecture (Effort: Low, Impact: Medium)

Issue: 26 compiler files suggest deep hierarchy; some boundaries may be unclear

Approach:

  1. Create pkg/workflow/COMPILER_ARCHITECTURE.md documenting:
    • Compilation pipeline stages
    • File responsibilities
    • Data flow between files
    • Extension points
  2. Add package-level documentation to key compiler files

Benefits:

  • Onboarding (new developers understand compilation pipeline)
  • Maintenance (clearer boundaries prevent responsibility drift)

Estimated Effort: 4-6 hours

3.2 Helper File Reorganization (Effort: Medium, Impact: Low-Medium)

Target files: engine_helpers.go (mixed concerns), compiler_yaml_helpers.go (mixed concerns)

Approach:

  1. Analyze helper functions for cohesion
  2. Move domain-specific logic to appropriate main modules
  3. Keep generic utilities in helper files

Benefits:

  • Clearer helper file purposes
  • Reduced cognitive load when choosing where to add new functions

Estimated Effort: 4-8 hours per file


Implementation Guidelines

General Principles

  1. Preserve Behavior: Ensure all existing functionality works identically
  2. Maintain Exports: Keep public API unchanged (avoid breaking changes)
  3. Add Tests First: Write tests for each new module before refactoring (test-driven refactoring)
  4. Incremental Changes: Split one module at a time, verify tests pass after each split
  5. Run Tests Frequently: Verify make test-unit passes after each split
  6. Update Imports: Ensure all import paths are correct
  7. Document Changes: Add comments explaining module boundaries and responsibilities

File Splitting Strategy

When splitting large files:

  1. Identify semantic boundaries (functional areas with clear responsibilities)
  2. Create new files with descriptive names following existing naming conventions
  3. Move functions to new files, keeping related functions together
  4. Update imports in all affected files
  5. Run tests to verify no breakage
  6. Add new tests for newly created modules
  7. Document file purposes in package comments

Success Metrics

  • ✅ All tests pass (make test-unit, make test-integration)
  • ✅ No breaking changes to public API
  • ✅ Code passes linting (make lint)
  • ✅ Build succeeds (make build)
  • ✅ New files are under 500 lines each
  • ✅ Test coverage is maintained or improved (≥80% for new modules)

Organizational Structure Summary

Strengths ✅

  1. Clear semantic grouping - Files named by concept, not by type (good Go practice)
  2. Consistent naming conventions - Verb-prefix patterns uniform across packages
  3. Specialized utility packages - Focused packages for common concerns (stringutil, logger, etc.)
  4. Validation separation - Validation logic consistently in *_validation.go files
  5. Configuration clarity - Configuration parsing clearly marked with *_config*.go
  6. Create/Update parallelism - Parallel create_* and update_* patterns for GitHub entities

Weaknesses & Opportunities ⚠️

  1. Large monolithic files - 10+ files exceed 750 lines with mixed responsibilities
  2. Factory method accumulation - Files like permissions.go with 36+ factory methods
  3. Multi-concern files - Files combining UI, operations, and compilation logic
  4. Helper file over-use - Some helper files contain domain-specific logic
  5. Deep compiler hierarchy - 26 compiler files; unclear boundaries between some

Analysis Metadata

  • Total Go Files Analyzed: 483 (non-test files in pkg/)
  • Total Functions Cataloged: 1,500+ (estimated)
  • Function Clusters Identified: 8 major clusters (compiler, validation, safe_outputs, mcp, create, update, parse, build)
  • Outliers Found: 10 large files (>750 lines) with multiple responsibilities
  • Large Files Detected: 10 files >750 lines (top file: 1,218 lines)
  • Helper Files Analyzed: 13 files
  • Validation Files: 31 files with consistent *_validation.go pattern
  • Detection Method: Pattern matching + semantic analysis + file reading
  • Analysis Date: 2026-01-28
  • Workflow Run: §21443298878

Acceptance Criteria

Implementation is complete when:

  • Top 5 large files (>900 lines) are split into focused modules
  • permissions.go is split into parser/factory/operations files
  • mcp_renderer.go is split by target (github/playwright/custom)
  • Phase 2 validation helpers are implemented and adopted
  • All tests pass (make test-unit and make test-integration)
  • Test coverage maintained or improved (≥80% for new modules)
  • No breaking changes to public APIs
  • Code passes linting and builds successfully
  • Compiler architecture documentation added

Related Issues

  • #12263 - Detailed plan for refactoring add_command.go (already exists)

Priority: Medium
Effort: Large (multi-week effort across multiple files)
Expected Impact: Significantly improved maintainability, easier testing, reduced complexity, better separation of concerns, faster onboarding for new developers

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions