Skip to content

[refactor] Semantic Function Clustering Analysis - Major Refactoring Opportunities IdentifiedΒ #9254

@github-actions

Description

@github-actions

Executive Summary

Analysis of 220 non-test Go files in the pkg/ directory has identified significant opportunities for code consolidation and improved organization. The most critical finding is ~2,000 lines of duplicated code in safe output configuration handling, with repetitive patterns across 60+ nearly identical functions.

Full Report

Analysis Overview

  • Total Go files analyzed: 220 (excluding tests)
  • Total functions cataloged: 1,200+
  • Function clusters identified: 8 major clusters
  • Critical duplications found: 5 high-impact areas
  • Files needing reorganization: 47
  • Detection method: Pattern analysis + semantic code review

1. Critical Issues - High Priority

πŸ”΄ Issue #1: Massive Safe Output Config Duplication

Scope: 19 files with 60+ nearly identical functions
Lines of duplicate code: ~2,000
Impact: High - makes maintenance difficult and error-prone

Problem Details

The safe output configuration system has severe code duplication:

  1. Single 476-line mega-function in pkg/workflow/compiler_safe_outputs_config.go:17-476

    • Function: addHandlerManagerConfigEnvVar()
    • Contains 15 nearly identical if-blocks, one per output type
    • Each block follows identical pattern with minor variations
  2. 60+ duplicate parse*Config() functions across 30+ files

    • Files: create_issue.go, create_discussion.go, create_pull_request.go, etc.
    • Pattern: 80-90% identical code structure
    • Each has same boilerplate for expires handling, base config parsing, validation
  3. 37+ duplicate build*Job() functions

    • Pattern: Similar job-building logic repeated
    • Note: BuildListSafeOutputJob() exists in safe_output_builder.go but is underutilized

Example: Repetitive Code in compiler_safe_outputs_config.go

// Lines 17-48 - CreateIssues handler
if data.SafeOutputs.CreateIssues != nil {
    cfg := data.SafeOutputs.CreateIssues
    handlerConfig := make(map[string]any)
    if cfg.Max > 0 { handlerConfig["max"] = cfg.Max }
    if len(cfg.AllowedLabels) > 0 { handlerConfig["allowed_labels"] = cfg.AllowedLabels }
    if cfg.RequiredTitlePrefix != "" { handlerConfig["required-title-prefix"] = cfg.RequiredTitlePrefix }
    // ... 8 more similar lines
    config["create_issue"] = handlerConfig
}

// Lines 50-70 - AddComments handler (nearly identical!)
if data.SafeOutputs.AddComments != nil {
    cfg := data.SafeOutputs.AddComments
    handlerConfig := make(map[string]any)
    if cfg.Max > 0 { handlerConfig["max"] = cfg.Max }
    // ... same pattern repeated
    config["add_comment"] = handlerConfig
}

// REPEATS 13 MORE TIMES with minor variations

Example: Duplicate Parse Pattern

Found in create_issue.go:23-50, create_discussion.go:23-50, create_pull_request.go:23-50, and 10+ other files:

func (c *Compiler) parseXConfig(outputMap map[string]any) *XConfig {
    if _, exists := outputMap["x"]; !exists { 
        return nil 
    }
    
    log.Print("Parsing x configuration")
    configData, _ := outputMap["x"].(map[string]any)
    
    // Expires handling - DUPLICATED in 10+ files
    if configData != nil {
        if expires, exists := configData["expires"]; exists {
            if _, ok := expires.(string); ok {
                expiresInt := parseExpiresFromConfig(configData)
                if expiresInt > 0 {
                    configData["expires"] = expiresInt
                }
            }
        }
    }
    
    var config XConfig
    if err := unmarshalConfig(outputMap, "x", &config, log); err != nil {
        return nil
    }
    
    c.parseBaseSafeOutputConfig(configData, &config.BaseSafeOutputConfig, defaultMax)
    return &config
}

This pattern is repeated 30+ times with only the type name changing!

Recommendation

Refactor using generics and registry pattern:

  1. Create generic config parser in pkg/workflow/config_helpers.go:

    func ParseSafeOutputConfig[T any](
        outputMap map[string]any,
        key string,
        defaultMax int,
        preprocessor func(map[string]any),
    ) (*T, error)
  2. Replace 476-line function with handler registry:

    type HandlerConfigBuilder interface {
        BuildConfig(cfg any) map[string]any
    }
    
    var handlerRegistry = map[string]HandlerConfigBuilder{
        "create_issue": &CreateIssueBuilder{},
        "add_comment": &AddCommentBuilder{},
        // ... register all handlers
    }
  3. Consolidate into subdirectory structure:

    pkg/workflow/safe_outputs/
      config/
        parser.go          # Generic config parser
        builder.go         # Generic handler config builder
        validation.go      # Shared validation
      types/
        create_types.go    # CreateIssue, CreateDiscussion, CreatePR
        update_types.go    # UpdateIssue, UpdateDiscussion, UpdatePR
        close_types.go     # CloseIssue, CloseDiscussion
        misc_types.go      # Other configs
      handlers/
        registry.go        # Handler registration
    

Estimated Impact:

  • Reduce code by ~1,800 lines
  • Eliminate 60+ duplicate functions
  • Make adding new safe output types trivial (add to registry)
  • Significantly improve maintainability

πŸ”΄ Issue #2: Scattered Validation Logic

Scope: 27 files with *validation* in name
Impact: Medium - inconsistent validation patterns

Problem Details

Validation logic is spread across 27 separate files with no common framework:

Files with validation concerns:

  • pkg/workflow/*_validation.go (27 files)
  • pkg/workflow/permissions_validator.go
  • pkg/workflow/validation_helpers.go (exists but underutilized)

Issues:

  1. No common validation interface or framework
  2. Each validator implements its own pattern
  3. permissions_validator.go mixes validation logic with static data definitions
  4. Helper functions in validation_helpers.go are rarely used

Example: Mixed Concerns in permissions_validator.go

// Contains both validation logic AND static toolset data
func ValidatePermissions(...) error {
    // validation logic
}

func GetToolsetsData() map[string]ToolsetData {
    // Returns 200+ lines of static configuration data
    // Should be in separate file or JSON
}

Recommendation

  1. Create validation framework in pkg/workflow/validation/:

    type ValidationRule interface {
        Validate(value any) error
        ErrorMessage() string
    }
    
    type Validator struct {
        rules []ValidationRule
    }
    
    func (v *Validator) ValidateAll(config any) []error
  2. Extract static data from permissions_validator.go to:

    • Separate file: pkg/workflow/toolset_data.go, or
    • JSON file loaded at runtime
  3. Group related validations:

    pkg/workflow/validation/
      framework.go        # Common validation interfaces
      schema.go          # Schema validation (combine 5 schema_*.go files)
      expression.go      # Expression validation
      permissions.go     # Permission validation
      runtime.go         # Runtime validation
      features.go        # Feature validation
    

Estimated Impact:

  • Consolidate 27 files into ~8 focused files
  • Create reusable validation framework
  • Improve consistency across validators

🟑 Issue #3: Helper Function Underutilization

Scope: Existing helper functions not widely adopted
Impact: Medium - leads to continued code duplication

Problem Details

Well-designed helper functions exist but are underutilized:

  1. pkg/workflow/config_helpers.go (13 functions)

    • ParseStringArrayFromConfig(), ParseIntFromConfig(), ParseBoolFromConfig()
    • parseLabelsFromConfig(), parseTitlePrefixFromConfig()
    • Usage: Only 12 files use these helpers
    • Problem: 30+ files still implement custom parsing
  2. pkg/workflow/safe_output_builder.go (13+ functions)

    • BuildTargetEnvVar(), BuildRequiredLabelsEnvVar()
    • BuildListSafeOutputJob() - comprehensive job builder
    • Usage: Only a few files leverage the full builder pattern
    • Problem: Most files build env vars manually
  3. pkg/workflow/engine_helpers.go (7 functions)

    • GetBaseInstallationSteps(), ExtractAgentIdentifier()
    • Good example: Properly centralized and widely reused βœ“

Recommendation

  1. Promote awareness of existing helpers through:

    • Code review guidelines
    • Developer documentation
    • Linting rules to detect reinvention
  2. Expand helper coverage:

    • Add generic config parser (as mentioned in Issue rejig docsΒ #1)
    • Add environment variable builder registry
    • Add common validation helpers
  3. Refactor existing code to use helpers:

    • Replace custom parsing with ParseStringArrayFromConfig()
    • Replace manual env var building with BuildTargetEnvVar()

Estimated Impact:

  • Reduce ~500 lines of duplicate parsing code
  • Standardize configuration handling
  • Improve code discoverability

2. Function Clusters by Semantic Pattern

Cluster 1: Safe Output Functions ⚠️ NEEDS REFACTORING

Pattern: create_*, update_*, close_*, add_*, safe_outputs_*
File count: 19 files
Function count: 100+ functions

Files:

pkg/workflow/
  create_issue.go, create_discussion.go, create_pull_request.go
  update_issue.go, update_discussion.go, update_pull_request.go
  close_entity_helpers.go, update_entity_helpers.go
  add_comment.go, add_labels.go, add_reviewer.go
  safe_output_*.go (4 files)
  safe_outputs_*.go (9 files)
  safe_inputs_*.go (3 files)
  compiler_safe_outputs_*.go (9 files)

Analysis: Critical refactoring needed - see Issue #1 above


Cluster 2: Compiler Functions βœ“ MOSTLY GOOD

Pattern: compiler_*
File count: 21 files
Function count: 150+ functions

Files:

pkg/workflow/
  compiler.go (main compiler)
  compiler_jobs.go, compiler_activation_jobs.go
  compiler_orchestrator.go, compiler_types.go
  compiler_yaml*.go (5 YAML generation files)
  compiler_safe_outputs*.go (9 safe output compilation files)
  compiler_filters_validation.go

Analysis: Generally well-organized, but:

  • Could benefit from subdirectory organization
  • compiler_safe_outputs_config.go needs refactoring (Issue rejig docsΒ #1)
  • YAML generation files could be grouped in subdirectory

Recommendation: Minor reorganization into subdirectories:

pkg/workflow/compiler/
  core/compiler.go, types.go, orchestrator.go
  jobs/main.go, activation.go, safe_outputs.go
  yaml/generator.go, helpers.go, artifacts.go

Cluster 3: Validation Functions ⚠️ NEEDS FRAMEWORK

Pattern: *_validation.go, validate*
File count: 27 files
Function count: 80+ validation functions

Analysis: See Issue #2 above - needs validation framework


Cluster 4: Engine Functions βœ“ WELL ORGANIZED

Pattern: *_engine.go, engine_*
File count: 13 files
Function count: 60+ functions

Files:

pkg/workflow/
  agentic_engine.go, claude_engine.go, codex_engine.go
  copilot_engine.go, custom_engine.go
  engine.go, engine_helpers.go, engine_output.go
  engine_validation.go, engine_firewall_support.go
  copilot_engine_execution.go
  copilot_engine_installation.go
  copilot_engine_tools.go
  copilot_logs.go, claude_logs.go, codex_logs.go

Analysis: Good organization βœ“

  • Each engine type has dedicated file
  • Common helpers centralized in engine_helpers.go
  • Engine-specific concerns properly separated

Minor improvement: Could group Copilot files into subdirectory:

pkg/workflow/engines/
  copilot/engine.go, execution.go, installation.go, tools.go
  claude/engine.go, logs.go
  codex/engine.go, logs.go

Cluster 5: MCP Functions βœ“ GOOD ORGANIZATION

Pattern: mcp_*
File count: 7 files
Function count: 40+ functions

Files:

pkg/workflow/
  mcp-config.go, mcp_config_validation.go
  mcp_gateway_constants.go, mcp_renderer.go
  mcp_servers.go
pkg/cli/
  mcp*.go (20+ files - separate CLI concern)

Analysis: Well organized βœ“

  • Clear separation of concerns
  • Validation properly separated
  • Constants properly isolated

Cluster 6: Expression Functions βœ“ GOOD ORGANIZATION

Pattern: expression_*
File count: 5 files
Function count: 30+ functions

Files:

pkg/workflow/
  expression_builder.go
  expression_extraction.go
  expression_nodes.go
  expression_parser.go
  expression_validation.go

Analysis: Excellent organization βœ“

  • Clear separation by concern (builder, parser, validator)
  • One feature per file principle followed
  • Easy to navigate and understand

Cluster 7: Frontmatter Functions βœ“ GOOD ORGANIZATION

Pattern: frontmatter_*
File count: 5 files
Function count: 20+ functions

Files:

pkg/workflow/
  frontmatter_error.go
  frontmatter_extraction_metadata.go
  frontmatter_extraction_security.go
  frontmatter_extraction_yaml.go
  frontmatter_types.go

Analysis: Well organized βœ“

  • Separation by aspect (metadata, security, YAML)
  • Types properly isolated
  • Error handling dedicated file

Cluster 8: Runtime Functions βœ“ GOOD ORGANIZATION

Pattern: runtime_*
File count: 6 files
Function count: 25+ functions

Files:

pkg/workflow/
  runtime_deduplication.go
  runtime_definitions.go
  runtime_detection.go
  runtime_overrides.go
  runtime_step_generator.go
  runtime_validation.go

Analysis: Well organized βœ“

  • Each aspect has dedicated file
  • Clear separation of concerns
  • Easy to locate specific functionality

3. Files with Potential Outlier Functions

Files where functions might not match primary purpose:

  1. pkg/workflow/agentic_engine.go (195 lines)

    • Primary purpose: Agentic engine implementation
    • Outliers found:
      • GetEngineRegistry() - Engine registry management (should be engine_registry.go)
      • GenerateSecretValidationStep() - Secret validation (should be engine_secrets.go)
      • GetCommonErrorPatterns() - Error patterns (should be engine_error_patterns.go)
    • Recommendation: Split into 3 files by concern
  2. pkg/workflow/permissions_validator.go

    • Primary purpose: Permission validation
    • Outlier found:
      • GetToolsetsData() - 200+ lines of static configuration data
    • Recommendation: Move data to toolset_data.go or JSON file
  3. pkg/workflow/compiler_safe_outputs_config.go (476 lines, 1 function!)

    • Primary purpose: Safe outputs configuration
    • Issue: Single mega-function with repetitive logic
    • Recommendation: See Issue rejig docsΒ #1 - refactor using registry pattern

4. Package-Level Organization Analysis

pkg/campaign (10 files) βœ“ EXCELLENT

campaign/
  command.go, loader.go, orchestrator.go
  spec.go, status.go, summary.go
  template.go, validation.go
  prompt_sections.go
  project_update_contract_validator.go

Analysis: Well-organized, clear separation of concerns


pkg/cli (130+ files) ⚠️ LARGE BUT ORGANIZED

cli/
  commands.go (main command structure)
  *_command.go (25+ command files)
  compile_*.go (15+ compile-related files)
  mcp_*.go (20+ MCP-related files)
  logs_*.go (12+ logs-related files)
  update_*.go (8+ update-related files)

Analysis:

  • Very large package but logically grouped
  • Command files properly separated
  • Feature clusters (compile, mcp, logs, update) are identifiable
  • Minor improvement: Could use subdirectories for large feature groups

pkg/console (10 files) βœ“ EXCELLENT

console/
  accessibility.go, banner.go, confirm.go
  console.go, format.go, layout.go
  list.go, progress.go, render.go, spinner.go

Analysis: Perfect size and organization


pkg/parser (29 files) βœ“ WELL ORGANIZED

parser/
  ansi_strip.go, content_extractor.go
  frontmatter*.go (3 files)
  github*.go (2 files)
  import_*.go (4 files)
  include_*.go (2 files)
  schema_*.go (6 files)
  schedule_parser.go
  mcp.go, remote_fetch.go

Analysis: Good logical grouping, clear naming conventions


pkg/workflow (370+ files) ⚠️ NEEDS SUBDIRECTORIES

Size: Largest package with 370+ files
Analysis: Flat structure is difficult to navigate

Recommendation: Organize into subdirectories:

pkg/workflow/
  core/               # compiler.go, workflow.go, etc.
  safe_outputs/       # All safe output files (50+ files)
  validation/         # All validation files (27 files)
  engines/            # All engine files (13 files)
  expressions/        # Expression files (5 files)
  runtime/            # Runtime files (6 files)
  frontmatter/        # Frontmatter files (5 files)
  mcp/                # MCP files (7 files)
  helpers/            # config_helpers.go, map_helpers.go, etc.

5. Summary Statistics

Metric Count Status
Total non-test Go files 220 -
Packages analyzed 10 -
Function clusters 8 major -
Well-organized clusters 5 βœ“
Needs refactoring 2 ⚠️
Needs framework 1 ⚠️
Files with duplicated code 60+ ⚠️
Estimated duplicate lines ~2,000 ⚠️
Files needing reorganization 47 -
Outlier functions found 12 -

6. Refactoring Roadmap

Phase 1: Critical Duplications (High Impact)

Priority: πŸ”΄ Critical
Estimated effort: 8-12 hours
Impact: Eliminate ~1,800 lines of duplicate code

Tasks:

  • Refactor compiler_safe_outputs_config.go mega-function using registry pattern
  • Create generic ParseSafeOutputConfig[T]() function
  • Consolidate 60+ parse*Config() functions
  • Create handler config builder registry
  • Move safe output files to subdirectory structure

Phase 2: Validation Framework (Medium Impact)

Priority: 🟑 Important
Estimated effort: 6-8 hours
Impact: Consolidate 27 validation files, improve consistency

Tasks:

  • Design and implement validation framework interface
  • Extract static data from permissions_validator.go
  • Group related validation files into subdirectories
  • Create common validation helpers
  • Update existing validators to use framework

Phase 3: Helper Adoption (Medium Impact)

Priority: 🟑 Important
Estimated effort: 4-6 hours
Impact: Reduce ~500 lines of duplicate parsing code

Tasks:

  • Document existing helper functions
  • Refactor files to use config_helpers.go functions
  • Refactor files to use safe_output_builder.go functions
  • Add linting rules to detect helper reinvention
  • Create developer guide for adding new safe outputs

Phase 4: Package Organization (Low Impact, Long-term)

Priority: 🟒 Enhancement
Estimated effort: 6-10 hours
Impact: Improved code navigation and maintainability

Tasks:

  • Create subdirectory structure for pkg/workflow
  • Move safe output files to subdirectories
  • Move validation files to subdirectories
  • Move compiler files to subdirectories
  • Update import paths across codebase
  • Update documentation

7. Implementation Checklist

Immediate Actions (Week 1)

  • Review findings with team
  • Prioritize refactoring tasks
  • Create design document for generic config parser
  • Create design document for handler registry
  • Set up feature branch for refactoring work

Short-term Actions (Weeks 2-4)

  • Implement generic ParseSafeOutputConfig[T]() function
  • Refactor compiler_safe_outputs_config.go using registry
  • Consolidate parse*Config() functions (batch 1: create_*)
  • Consolidate parse*Config() functions (batch 2: update_*)
  • Add comprehensive tests for new generic functions
  • Begin validation framework design

Long-term Actions (Months 2-3)

  • Complete validation framework implementation
  • Implement package reorganization
  • Create developer documentation
  • Add linting rules for code quality
  • Consider code generation for remaining boilerplate

Analysis Metadata

  • Repository: githubnext/gh-aw
  • Analysis date: 2026-01-07
  • Files analyzed: 220 non-test Go files
  • Primary focus: pkg/workflow (largest package)
  • Detection method: Pattern analysis, semantic clustering, manual code review
  • Tools used: grep, file reading, structural analysis

Quick Action Items

  1. Immediate: Refactor pkg/workflow/compiler_safe_outputs_config.go:17-476 using registry pattern
  2. This week: Create generic ParseSafeOutputConfig[T]() to replace 60+ duplicate functions
  3. This month: Implement validation framework to consolidate 27 validation files
  4. Long-term: Reorganize pkg/workflow/ into subdirectories (370+ files in flat structure)

Benefits of Refactoring

  • Code reduction: ~2,000 lines of duplicate code eliminated
  • Maintainability: Adding new safe output types becomes trivial
  • Consistency: Validation handled uniformly across codebase
  • Navigation: Subdirectory structure improves code discoverability
  • Testing: Generic functions easier to test thoroughly

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions