-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Semantic analysis of Go codebase to identify refactoring opportunities through function clustering, outlier detection, and code organization patterns
Executive Summary
Analyzed 483 Go source files across the pkg/ directory (247 in workflow, 160 in CLI, 76 in supporting packages). The codebase demonstrates strong semantic organization with consistent naming conventions, but several high-impact refactoring opportunities exist:
- Large monolithic files: 10+ files exceed 750 lines with mixed responsibilities
- Helper file accumulation: 13 helper files with varying levels of cohesion
- Factory method concentration: Files like
permissions.go(928 lines, 36+ factory methods) - Validation pattern duplication: 76+ similar validation patterns across 32 files
- Configuration parsing patterns: Multiple parse*Config functions with similar logic
Impact: Improved maintainability, testability, and reduced cognitive overhead for developers.
Function Inventory by Package
Primary Packages Analysis
| Package | Files | Lines (Est.) | Primary Purpose | Organization Quality |
|---|---|---|---|---|
| workflow | 247 | 117K+ | Core compilation/execution engine | |
| cli | 160 | 38K+ | Command-line interface | |
| parser | 30 | Unknown | Parsing utilities | ✅ Well organized |
| campaign | 14 | Unknown | Campaign management | ✅ Focused scope |
| console | 11 | Unknown | UI/Output rendering | ✅ Clean separation |
Utility Packages (Well Organized ✅)
Small, focused packages with clear responsibilities:
- stringutil (5 files), logger (4 files), types (2 files)
- Single-file utilities: tty, timeutil, testutil, styles, sliceutil, repoutil, mathutil, gitutil, envutil, constants
Clustering Results
The codebase follows semantic underscore-separated naming with clear verb-prefix patterns:
File Naming Patterns (Semantic Grouping)
| Pattern | Count | Purpose | Organization |
|---|---|---|---|
compiler_* |
26 | Compilation orchestration | ✅ Clear grouping |
safe_outputs* |
20 | Safety boundary outputs | ✅ Well organized |
mcp_* |
16 | Model Context Protocol | ✅ Cohesive cluster |
create_* |
8 | GitHub entity creation | ✅ One per entity type |
update_* |
7 | GitHub entity updates | ✅ Parallel to create_* |
*_validation.go |
31 | Validation logic | ✅ Separated from main logic |
*_helpers.go |
13 | Helper functions | |
*_config*.go |
Multiple | Configuration parsing |
Function Naming Patterns
Workflow Package - Top Patterns:
59 parse* (parseWorkflow, parseExpression, parseIssuesConfig)
56 get* (getPermissions, getRequiredSecrets, getAllowedDomains)
50 generate* (generateWorkflowHeader, generateSafeOutputsConfig)
37 extract* (extractExpressions, extractSecrets, extractMetadata)
34 validate* (validateExpression, validateSandbox, validateRuntimeMode)
32 is* (isAllowed, isEnabled, isValid)
24 build* (buildJob, buildStep, buildCopilotAssignmentStep)
14 render* (renderYAML, renderSharedMCPConfig, renderGitHubMCP)
```
**CLI Package - Top Patterns:**
```
64 get* (getConfig, getWorkflows, getStatus)
29 parse* (parseFlags, parseWorkflow)
27 extract* (extractLogs, extractMetrics)
25 ensure* (ensureDirectory, ensureConfig)
20 run* (runWorkflow, runCommand)
19 create* (createCommand, createPR, createIssue)
Observation: Consistent camelCase for unexported functions, PascalCase for exported (idiomatic Go). No mixed patterns.
Identified Issues
1. Large Monolithic Files (Refactoring Priority: High)
Files with multiple semantic concerns that should be split:
Top 10 Largest Files
| Rank | File | Lines | Package | Functions | Issues |
|---|---|---|---|---|---|
| 1 | add_command.go |
1,218 | cli | 18 | Combines workflow resolution, UI, addition, compilation, PR creation, file tracking |
| 2 | add_interactive.go |
1,025 | cli | 23 | Interactive UI + workflow operations + file operations mixed |
| 3 | trial_command.go |
1,000 | cli | Unknown | Trial execution command |
| 4 | mcp_server.go |
1,000 | cli | Unknown | MCP server management |
| 5 | safe_outputs_config_generation.go |
978 | workflow | 5 | Configuration generation from workflow data |
| 6 | permissions.go |
928 | workflow | 36 | 36 factory methods + parsing + operators |
| 7 | init.go |
868 | cli | Unknown | Initialization logic |
| 8 | audit.go |
864 | cli | Unknown | Audit trail generation |
| 9 | logs_orchestrator.go |
855 | cli | 5 | Downloading + normalization + filtering + parsing |
| 10 | mcp_renderer.go |
844 | workflow | 4 | Renders GitHub/Playwright/custom MCPs |
Detailed Examples
Example 1: pkg/cli/add_command.go (1,218 lines)
Issue: Single file handles 6 distinct responsibilities
Functional areas:
- Command definition & flag configuration (203 lines)
- Workflow resolution & parsing (130 lines)
- Repository listing & interactive selection (150 lines)
- Normal workflow addition (101 lines)
- PR creation flow (105 lines)
- Single workflow addition with tracking (262 lines - complexity hotspot)
- Compilation helpers (190 lines)
Recommendation: Split into 6 focused modules:
add_command.go- Command definition (~250 lines)add_workflow_resolution.go- Workflow resolution (~200 lines)add_workflow_repository.go- Repository operations (~150 lines)add_workflow_operations.go- Core addition logic (~400 lines)add_workflow_pr.go- PR flow (~110 lines)add_workflow_compilation.go- Compilation helpers (~150 lines)
Impact: Easier testing, clearer responsibilities, reduced cognitive load
Note: Issue #12263 already tracks this refactoring with a detailed plan.
Example 2: pkg/workflow/permissions.go (928 lines, 36+ functions)
Issue: Factory method accumulation - 36+ New*Permission() functions
Functions:
- 5 parsing methods
- 36+ factory methods (
NewContentsReadPermission,NewIssuesWritePermission, etc.) - 8+ operator methods (merge, render, validate)
Recommendation: Split into 3 semantic modules:
permissions_parser.go- Parsing logic (5 methods)permissions_factory.go- Factory methods (36+ New* functions)permissions_operations.go- Operators (merge, render, validate)
Estimated Impact: Improved discoverability, easier to locate specific permission types
Example 3: pkg/workflow/mcp_renderer.go (844 lines)
Issue: Multi-target renderer in single file
Renders:
- GitHub MCP config
- Playwright MCP config
- Custom tool MCP config
- Shared MCP infrastructure
Recommendation: Split by target:
mcp_renderer.go- Base renderer & shared logicmcp_renderer_github.go- GitHub-specific renderingmcp_renderer_playwright.go- Playwright-specific renderingmcp_renderer_custom.go- Custom tool rendering
Estimated Impact: Parallel development, clearer boundaries, easier testing
2. Helper File Organization (Refactoring Priority: Medium)
13 helper files identified with varying levels of cohesion:
Helper Files Analysis
Well-Organized Helper Files ✅
| File | Functions | Cohesion | Assessment |
|---|---|---|---|
validation_helpers.go |
7 | High | Generic validation patterns |
config_helpers.go |
10 | High | Configuration parsing utilities |
error_helpers.go |
5 | High | Error construction helpers |
git_helpers.go |
1 | High | Git tag extraction |
map_helpers.go |
2 | High | Map manipulation utilities |
Candidates for Reorganization ⚠️
| File | Functions | Issue | Recommendation |
|---|---|---|---|
engine_helpers.go |
10 | Mixed: installation + env filtering + path setup | Split: engine_installation.go, engine_environment.go |
compiler_yaml_helpers.go |
4+ | Mixed: path extraction + code generation + placeholders | Consider moving to main compiler file |
safe_outputs_config_generation_helpers.go |
10 | All related to config generation | ✅ Could stay together or merge with parent |
Pattern: Helper files work well when they contain generic, reusable utilities (validation, error handling, parsing). They become problematic when they accumulate domain-specific logic that belongs in main modules.
3. Validation Pattern Duplication (Refactoring Priority: Medium)
76+ similar validation patterns across 32 validation files
Common Patterns
| Pattern | Occurrences (Est.) | Files | Example Functions |
|---|---|---|---|
| Integer range validation | 15+ | 12 files | validateIntRange(value, min, max, fieldName) |
| Required field validation | 20+ | 18 files | String empty checks |
| List membership validation | 12+ | 9 files | value in allowedValues |
| String length validation | 10+ | 8 files | len(str) < minLength |
| File existence checks | 8+ | 6 files | os.Stat(path) patterns |
Existing Solution (Partial) ✅
validation_helpers.go already consolidates some patterns:
validateIntRange()- Integer range validationValidateRequired()- Required field validationValidateMaxLength()/ValidateMinLength()- Length validationValidateInList()- List membershipValidatePositiveInt()/ValidateNonNegativeInt()- Integer validationfileExists()- File existence check
Note: The file includes TODO comments indicating Phase 2 refactoring to add:
isEmptyOrNil()- Empty/nil checksgetMapFieldAsString()/getMapFieldAsMap()/getMapFieldAsBool()- Map field extractiondirExists()- Directory existence check
Recommendation: Continue consolidation effort. Migrate remaining validation patterns to use these helpers across the 32 validation files.
Estimated Impact: Reduced duplication, consistent validation behavior, easier testing
4. Configuration Parsing Patterns (Refactoring Priority: Low-Medium)
Similar parse*Config functions with overlapping logic
Identified Patterns
| Function Pattern | Occurrences | Files | Example |
|---|---|---|---|
parse*Config(configMap map[string]any) |
14+ | 8 files | parseLabelsFromConfig, parseMessagesConfig, parseAppConfig |
parse*FromConfig(configMap, key) |
10+ | 5 files | parseExpiresFromConfig, parseParticipantsFromConfig |
| Array parsing | 8+ | 4 files | String array extraction with type checking |
| String extraction | 12+ | 6 files | String value extraction with validation |
Existing Solution (Partial) ✅
config_helpers.go provides generic parsing helpers:
ParseStringArrayFromConfig(m, key, log)- Generic string array extractionextractStringFromMap(m, key, log)- Generic string extractionParseIntFromConfig(m, key, log)- Generic integer extractionParseBoolFromConfig(m, key, log)- Generic boolean extractionunmarshalConfig(m, key, target, log)- Type-safe YAML unmarshaling
Observation: Many specific parse functions are thin wrappers around these generic helpers:
// Thin wrapper pattern (appropriate)
func parseLabelsFromConfig(configMap map[string]any) []string {
return ParseStringArrayFromConfig(configMap, "labels", configHelpersLog)
}
func parseTitlePrefixFromConfig(configMap map[string]any) string {
return extractStringFromMap(configMap, "title-prefix", configHelpersLog)
}Assessment: Current organization is reasonable. Thin wrappers provide semantic clarity while avoiding duplication. No major refactoring needed.
5. Error Message Patterns (Refactoring Priority: Low)
Consistent error wrapping patterns across files
| Error Pattern | Occurrences | Files |
|---|---|---|
fmt.Errorf("failed to %s: %w", ...) |
114 | 37 files |
fmt.Errorf("could not %s: %w", ...) |
0 | - |
fmt.Errorf("unable to %s: %w", ...) |
1 | 1 file |
Observation: Codebase consistently uses "failed to" pattern (114 occurrences). This is good consistency.
Existing Solution ✅
error_helpers.go provides structured error constructors:
NewValidationError(field, value, reason, suggestion)NewOperationError(operation, entityType, entityID, cause, suggestion)NewConfigurationError(configKey, value, reason, suggestion)EnhanceError(err, context, suggestion)WrapErrorWithContext(err, context, suggestion)
Assessment: Error handling is well-structured with helper utilities. No major refactoring needed.
Detailed Function Clusters
Cluster 1: Creation Functions (create_* pattern)
Pattern: GitHub entity creation handlers - one file per entity type
| File | Functions | Organization |
|---|---|---|
create_issue.go |
CreateIssue(...) | ✅ Focused |
create_pull_request.go |
CreatePR(...) | ✅ Focused |
create_discussion.go |
CreateDiscussion(...) | ✅ Focused |
create_project.go |
CreateProject(...) | ✅ Focused |
create_project_status_update.go |
CreateProjectStatusUpdate(...) | ✅ Focused |
Analysis: ✅ Well-organized - Each creation function has its own file with clear responsibility. This pattern should be maintained.
Cluster 2: Update Functions (update_* pattern)
Pattern: GitHub entity modification handlers - parallel to create_* pattern
| File | Functions | Organization |
|---|---|---|
update_issue.go |
UpdateIssue(...) | ✅ Focused |
update_pull_request.go |
UpdatePR(...) | ✅ Focused |
update_discussion.go |
UpdateDiscussion(...) | ✅ Focused |
update_project.go |
UpdateProject(...) | ✅ Focused |
Analysis: ✅ Well-organized - Mirrors create_* pattern consistently.
Cluster 3: Validation Functions (validate* pattern)
Pattern: 31 validation files with *_validation.go suffix
Organization: ✅ Validation logic consistently separated from main logic
Examples:
expression_validation.go- Expression safety validationsandbox_validation.go- Sandbox configuration validationtemplate_validation.go- Template injection validationruntime_validation.go- Runtime mode validationpermissions_validation.go- Permission validationfeatures_validation.go- Feature flag validation
Analysis: ✅ Excellent pattern - Validation logic is isolated for easier testing and maintenance. Continue this pattern.
Cluster 4: Compilation Functions (compiler_* pattern)
Pattern: 26 compiler-related files
Organization:
Examples:
compiler.go- Main orchestrationcompiler_yaml.go- YAML generationcompiler_jobs.go- Job compilationcompiler_safe_outputs.go- Safe outputs compilationcompiler_activation_jobs.go- Activation job generationcompiler_orchestrator_*.go- Multiple orchestrator files
Observation: 26 compiler files suggest deep hierarchy. Some boundaries may be unclear.
Recommendation: Document the compilation pipeline architecture to clarify file responsibilities. Consider whether some compiler_orchestrator_* files could be consolidated.
Cluster 5: MCP Configuration (mcp_* pattern)
Pattern: 16 MCP-related files
| File | Purpose | Organization |
|---|---|---|
mcp_config.go |
Main configuration | ✅ |
mcp_renderer.go |
Rendering (844 lines) | |
mcp_github_config.go |
GitHub-specific config | ✅ |
mcp_playwright_config.go |
Playwright-specific config | ✅ |
mcp_config_custom.go |
Custom tool config | ✅ |
mcp_config_validation.go |
Validation | ✅ |
| Others | Setup, environment, gateway | ✅ |
Analysis: Generally well-organized. Main issue is mcp_renderer.go (844 lines) rendering multiple targets - candidate for splitting (see Issue #1 above).
Refactoring Recommendations
Priority 1: High Impact (Immediate Action)
1.1 Split Large Monolithic Files (Effort: Large, Impact: High)
Target files: add_command.go (1,218 lines), add_interactive.go (1,025 lines), trial_command.go (1,000 lines), mcp_server.go (1,000 lines)
Approach:
- Identify semantic boundaries (already documented for
add_command.goin #12263) - Split into focused modules (6-8 files per large file)
- Maintain public API exports
- Add tests for each new module
Benefits:
- Easier testing (smaller test files focused on specific functionality)
- Reduced cognitive load (developers work on smaller, focused files)
- Parallel development (multiple developers can work on different modules)
- Clearer responsibilities (each file has single, clear purpose)
Estimated Effort: 2-3 days per file (including tests)
1.2 Split Factory Method Files (Effort: Medium, Impact: High)
Target file: permissions.go (928 lines, 36+ factory methods)
Approach:
- Split into 3 files:
permissions_parser.go- Parsing logic (5 methods)permissions_factory.go- Factory methods (36+ New* functions)permissions_operations.go- Operators (merge, render, validate)
- Keep main
permissions.gofor types and core logic
Benefits:
- Improved discoverability (easier to find specific permission types)
- Faster file navigation
- Clearer file purposes
Estimated Effort: 4-6 hours
Priority 2: Medium Impact (Planned Work)
2.1 Continue Validation Helper Consolidation (Effort: Medium, Impact: Medium)
Current state: validation_helpers.go exists with 7+ helper functions. TODO comments indicate Phase 2 work.
Approach:
- Implement Phase 2 helpers:
isEmptyOrNil()- Empty/nil/zero checksgetMapFieldAsString()/getMapFieldAsMap()/getMapFieldAsBool()/getMapFieldAsInt()- Safe map field extractiondirExists()- Directory existence check
- Migrate existing validation patterns to use these helpers across 32 validation files
- Update tests to cover new helpers
Benefits:
- Reduced duplication (76+ patterns → centralized helpers)
- Consistent validation behavior
- Easier testing (test helpers once, use everywhere)
Estimated Effort: 1-2 days
2.2 Split Multi-Target Renderers (Effort: Medium, Impact: Medium)
Target file: mcp_renderer.go (844 lines)
Approach:
- Split by target:
mcp_renderer.go- Base renderer & shared logicmcp_renderer_github.go- GitHub-specific renderingmcp_renderer_playwright.go- Playwright-specific renderingmcp_renderer_custom.go- Custom tool rendering
- Use composition/interfaces for shared functionality
Benefits:
- Parallel development (different developers can work on different renderers)
- Clearer testing boundaries
- Easier to add new MCP targets
Estimated Effort: 6-8 hours
Priority 3: Long-term Improvements (Future Work)
3.1 Document Compiler Architecture (Effort: Low, Impact: Medium)
Issue: 26 compiler files suggest deep hierarchy; some boundaries may be unclear
Approach:
- Create
pkg/workflow/COMPILER_ARCHITECTURE.mddocumenting:- Compilation pipeline stages
- File responsibilities
- Data flow between files
- Extension points
- Add package-level documentation to key compiler files
Benefits:
- Onboarding (new developers understand compilation pipeline)
- Maintenance (clearer boundaries prevent responsibility drift)
Estimated Effort: 4-6 hours
3.2 Helper File Reorganization (Effort: Medium, Impact: Low-Medium)
Target files: engine_helpers.go (mixed concerns), compiler_yaml_helpers.go (mixed concerns)
Approach:
- Analyze helper functions for cohesion
- Move domain-specific logic to appropriate main modules
- Keep generic utilities in helper files
Benefits:
- Clearer helper file purposes
- Reduced cognitive load when choosing where to add new functions
Estimated Effort: 4-8 hours per file
Implementation Guidelines
General Principles
- Preserve Behavior: Ensure all existing functionality works identically
- Maintain Exports: Keep public API unchanged (avoid breaking changes)
- Add Tests First: Write tests for each new module before refactoring (test-driven refactoring)
- Incremental Changes: Split one module at a time, verify tests pass after each split
- Run Tests Frequently: Verify
make test-unitpasses after each split - Update Imports: Ensure all import paths are correct
- Document Changes: Add comments explaining module boundaries and responsibilities
File Splitting Strategy
When splitting large files:
- Identify semantic boundaries (functional areas with clear responsibilities)
- Create new files with descriptive names following existing naming conventions
- Move functions to new files, keeping related functions together
- Update imports in all affected files
- Run tests to verify no breakage
- Add new tests for newly created modules
- Document file purposes in package comments
Success Metrics
- ✅ All tests pass (
make test-unit,make test-integration) - ✅ No breaking changes to public API
- ✅ Code passes linting (
make lint) - ✅ Build succeeds (
make build) - ✅ New files are under 500 lines each
- ✅ Test coverage is maintained or improved (≥80% for new modules)
Organizational Structure Summary
Strengths ✅
- Clear semantic grouping - Files named by concept, not by type (good Go practice)
- Consistent naming conventions - Verb-prefix patterns uniform across packages
- Specialized utility packages - Focused packages for common concerns (stringutil, logger, etc.)
- Validation separation - Validation logic consistently in
*_validation.gofiles - Configuration clarity - Configuration parsing clearly marked with
*_config*.go - Create/Update parallelism - Parallel
create_*andupdate_*patterns for GitHub entities
Weaknesses & Opportunities ⚠️
- Large monolithic files - 10+ files exceed 750 lines with mixed responsibilities
- Factory method accumulation - Files like
permissions.gowith 36+ factory methods - Multi-concern files - Files combining UI, operations, and compilation logic
- Helper file over-use - Some helper files contain domain-specific logic
- Deep compiler hierarchy - 26 compiler files; unclear boundaries between some
Analysis Metadata
- Total Go Files Analyzed: 483 (non-test files in pkg/)
- Total Functions Cataloged: 1,500+ (estimated)
- Function Clusters Identified: 8 major clusters (compiler, validation, safe_outputs, mcp, create, update, parse, build)
- Outliers Found: 10 large files (>750 lines) with multiple responsibilities
- Large Files Detected: 10 files >750 lines (top file: 1,218 lines)
- Helper Files Analyzed: 13 files
- Validation Files: 31 files with consistent
*_validation.gopattern - Detection Method: Pattern matching + semantic analysis + file reading
- Analysis Date: 2026-01-28
- Workflow Run: §21443298878
Acceptance Criteria
Implementation is complete when:
- Top 5 large files (>900 lines) are split into focused modules
-
permissions.gois split into parser/factory/operations files -
mcp_renderer.gois split by target (github/playwright/custom) - Phase 2 validation helpers are implemented and adopted
- All tests pass (
make test-unitandmake test-integration) - Test coverage maintained or improved (≥80% for new modules)
- No breaking changes to public APIs
- Code passes linting and builds successfully
- Compiler architecture documentation added
Related Issues
- #12263 - Detailed plan for refactoring
add_command.go(already exists)
Priority: Medium
Effort: Large (multi-week effort across multiple files)
Expected Impact: Significantly improved maintainability, easier testing, reduced complexity, better separation of concerns, faster onboarding for new developers
AI generated by Semantic Function Refactoring