Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Opportunities #6225

@github-actions

Description

@github-actions

This analysis examined 286 non-test Go files across the repository (82,301 lines of code), cataloging function names, signatures, and organizational patterns to identify refactoring opportunities through semantic clustering and duplicate detection.

Key Statistics:

  • Total Go files analyzed: 286 (161 in pkg/workflow, 89 in pkg/cli, 14 in pkg/parser, 22 utility files)
  • Total lines of production code: ~82,301 lines
  • Files >1000 lines: 16 files requiring attention
  • Validation files scattered: 17 separate validation files in pkg/workflow alone
  • Duplicate patterns identified: Token handling (4 similar functions), Upload generation (7 similar functions), Package collection (3 duplicates)

Major Findings:

  • Large files mixing multiple responsibilities (compiler_yaml.go: 1,446 lines with YAML + prompts + uploads)
  • Validation logic scattered across 17+ files instead of centralized
  • Token handling functions with nearly identical implementations (4 variants)
  • Upload artifact generation duplicated 7 times with minimal variations
  • Safe outputs system fragmented across 8+ files
Full Analysis Report

Executive Summary

This semantic function clustering analysis identified significant refactoring opportunities across three major packages (workflow, cli, parser) to improve code organization, reduce duplication, and enhance maintainability. The analysis focused on identifying functions in wrong files (outliers), duplicate implementations, and opportunities for better modularization.

Repository Structure:

  • pkg/workflow: 161 files, 43,633 lines - Core workflow compilation engine
  • pkg/cli: 89 files, 29,033 lines - Command-line interface
  • pkg/parser: 14 files, 5,897 lines - Configuration parsing
  • Utilities: 22 files, ~3,700 lines - Supporting packages

Package 1: pkg/workflow (161 files, 43,633 lines)

Large Files Requiring Attention

File Lines Primary Issues
compiler_yaml.go 1,446 Mixed YAML generation, prompt generation, and upload steps
compiler_jobs.go 1,415 Job building with helper predicates mixed
copilot_engine.go 1,369 Could extract MCP rendering logic
frontmatter_extraction.go 1,047 22 extraction functions - focused but large
safe_outputs_config.go 1,024 Config parsing + generation + formatting mixed
runtime_setup.go 982 Detection + generation + deduplication mixed
mcp-config.go 982 Configuration + validation + parsing mixed

Issue 1.1: Token Handling Duplication (HIGH PRIORITY)

Location: pkg/workflow/safe_outputs_env_helpers.go

Four nearly identical functions for GitHub token precedence:

// Line 32
func (c *Compiler) addSafeOutputGitHubToken(steps *[]string, data *WorkflowData)

// Line 43
func (c *Compiler) addSafeOutputGitHubTokenForConfig(steps *[]string, data *WorkflowData, configToken string)

// Line 62 
func (c *Compiler) addSafeOutputCopilotGitHubTokenForConfig(steps *[]string, data *WorkflowData, configToken string)

// Line 82
func (c *Compiler) addSafeOutputAgentGitHubTokenForConfig(steps *[]string, data *WorkflowData, configToken string)

Problem: All four follow nearly identical token precedence logic (check custom token → check copilot token → fall back to default) with only minor variations for copilot vs agent contexts.

Recommendation: Consolidate using configuration-based approach:

type TokenContext struct {
    ConfigToken  string
    TokenType    TokenType // Generic, Copilot, Agent
    DefaultToken string
}

func (c *Compiler) addSafeOutputGitHubTokenWithContext(
    steps *[]string, 
    data *WorkflowData, 
    context TokenContext,
)

Impact:

  • Reduce ~80 lines of duplicate code
  • Single source of truth for token precedence logic
  • Easier to test and modify precedence rules
  • Estimated effort: 2-3 hours

Issue 1.2: Upload Artifact Generation Duplication (HIGH PRIORITY)

Location: pkg/workflow/compiler_yaml.go

Seven similar functions generating artifact upload steps (lines 477-665):

func (c *Compiler) generateUploadAgentLogs(yaml *strings.Builder, logFileFull string)       // Line 477
func (c *Compiler) generateUploadAssets(yaml *strings.Builder)                                // Line 490
func (c *Compiler) generateUploadAwInfo(yaml *strings.Builder)                                // Line 618
func (c *Compiler) generateUploadPrompt(yaml *strings.Builder)                                // Line 631
func (c *Compiler) generateUploadAccessLogs(yaml *strings.Builder, tools map[string]any)      // Line 648
func (c *Compiler) generateUploadMCPLogs(yaml *strings.Builder)                               // Line 652
func (c *Compiler) generateUploadSafeInputsLogs(yaml *strings.Builder)                        // Line 665

Pattern: All follow identical structure:

- name: Upload [X]
  if: [condition]
  uses: actions/upload-artifact@[PIN]
  with:
    name: [artifact-name]
    path: [artifact-path]
    retention-days: [days]

Recommendation: Create unified helper function:

type UploadArtifactConfig struct {
    Name           string
    Path           string
    Condition      string
    RetentionDays  int
}

func (c *Compiler) generateUploadArtifactStep(
    yaml *strings.Builder, 
    config UploadArtifactConfig,
)

Impact:

  • Reduce ~100-120 lines of duplicate code
  • Easier to update upload-artifact action versions
  • Consistent upload configuration across all artifacts
  • Estimated effort: 1-2 hours

Issue 1.3: Validation Functions Scattered (MEDIUM PRIORITY)

Current Distribution - 17 validation files in pkg/workflow:

pkg/workflow/
├── agent_validation.go
├── bundler_validation.go
├── docker_validation.go
├── engine_validation.go
├── expression_validation.go
├── github_toolset_validation_error.go
├── mcp_config_validation.go
├── npm_validation.go
├── pip_validation.go
├── repository_features_validation.go
├── runtime_validation.go
├── safe_output_validation_config.go
├── schema_validation.go
├── step_order_validation.go
├── strict_mode_validation.go
├── template_validation.go
└── validation.go

Problem: Validation logic for different domains is scattered across the main workflow directory, making it hard to understand validation boundaries and maintain consistent validation patterns.

Recommendation: Create pkg/workflow/validation/ subdirectory:

pkg/workflow/validation/
├── agent.go         (from agent_validation.go)
├── bundler.go       (from bundler_validation.go)
├── docker.go        (from docker_validation.go)
├── engine.go        (from engine_validation.go)
├── expression.go    (from expression_validation.go)
├── mcp_config.go    (from mcp_config_validation.go)
├── npm.go           (from npm_validation.go)
├── permissions.go   (extracted from permissions.go)
├── pip.go           (from pip_validation.go)
├── repository.go    (from repository_features_validation.go)
├── runtime.go       (from runtime_validation.go)
├── safe_outputs.go  (from safe_output_validation_config.go)
├── schema.go        (from schema_validation.go)
├── step_order.go    (from step_order_validation.go)
├── strict_mode.go   (from strict_mode_validation.go)
├── template.go      (from template_validation.go)
└── validation.go    (from validation.go - core types)

Impact:

  • Clear module boundaries for validation logic
  • Easier to locate and maintain validation rules
  • Natural import path (workflow/validation)
  • Estimated effort: 3-4 hours (mostly file moves)

Issue 1.4: compiler_yaml.go Mixed Responsibilities (MEDIUM PRIORITY)

File: pkg/workflow/compiler_yaml.go (1,446 lines)

Functions: 29 functions covering multiple concerns:

  • YAML generation: generateYAML, generateMainJobSteps, generatePostSteps
  • Prompt generation: generatePrompt, generatePromptStep, generateEngineSpecificPromptStep (5+ functions)
  • Upload step generation: 7 generateUpload* functions
  • Pattern conversion: convertGoPatternToJavaScript, convertErrorPatternsToJavaScript
  • Helper utilities: splitContentIntoChunks, generatePlaceholderSubstitutionStep

Problem: Single file handles YAML orchestration, prompt generation, upload steps, pattern conversion, and utilities - too many distinct concerns.

Recommendation: Split into focused files:

pkg/workflow/
├── compiler_yaml.go                (~400 lines - main YAML orchestration)
├── compiler_yaml_prompts.go        (~300 lines - all prompt generation)
├── compiler_yaml_uploads.go        (~200 lines - all upload steps)
├── compiler_yaml_patterns.go       (~200 lines - pattern conversion utilities)
└── compiler_yaml_steps.go          (~300 lines - step generation helpers)

Impact:

  • Five focused files (~200-400 lines each) vs one 1,446-line file
  • Clearer separation of concerns
  • Easier to test individual components
  • Estimated effort: 4-5 hours

Issue 1.5: Safe Outputs System Fragmentation (MEDIUM PRIORITY)

Problem: Safe outputs logic is scattered across 8+ files in the main workflow directory:

Current files:

pkg/workflow/
├── safe_outputs.go                     (core types)
├── safe_output_builder.go             (builder pattern)
├── safe_output_validation_config.go   (validation config)
├── safe_outputs_app.go                (app integration)
├── safe_outputs_config.go             (configuration - 1,024 lines!)
├── safe_outputs_env_helpers.go        (environment variable helpers)
├── safe_outputs_jobs.go               (job generation)
├── safe_outputs_steps.go              (step generation)
└── safe_inputs.go                     (related safe inputs system)

Recommendation: Create pkg/workflow/safeoutputs/ subdirectory:

pkg/workflow/safeoutputs/
├── outputs.go           (from safe_outputs.go - core types)
├── builder.go           (from safe_output_builder.go)
├── config.go            (from safe_outputs_config.go)
├── validation.go        (from safe_output_validation_config.go)
├── app.go               (from safe_outputs_app.go)
├── env_helpers.go       (from safe_outputs_env_helpers.go)
├── jobs.go              (from safe_outputs_jobs.go)
├── steps.go             (from safe_outputs_steps.go)
└── inputs.go            (from safe_inputs.go - separate or related?)

Impact:

  • Modularizes safe outputs system with clear boundary
  • Natural import path (workflow/safeoutputs)
  • Easier to understand safe outputs architecture
  • Estimated effort: 3-4 hours (mostly file moves + import updates)

Issue 1.6: Package Collection Pattern Duplication (LOW PRIORITY)

Location: pkg/workflow/dependabot.go

Three functions with identical structure for different package managers:

func (c *Compiler) collectNpmDependencies(...) ([]npmPackage, error)    // Line ~100
func (c *Compiler) collectPipDependencies(...) ([]pipPackage, error)    // Line ~250
func (c *Compiler) collectGoDependencies(...) ([]goPackage, error)      // Line ~400

Pattern: All follow identical logic:

  1. Iterate through actions in lock file
  2. Extract package references from action specifications
  3. Parse package strings
  4. Deduplicate package list
  5. Return typed package list

Recommendation: Use generics or interface-based approach:

type PackageCollector interface {
    ParsePackage(spec string) (Package, error)
    PackageType() string
}

func (c *Compiler) collectPackages[T Package](
    collector PackageCollector,
    actions []Action,
) ([]T, error)

Impact:

  • Reduce ~150 lines of duplicate logic
  • Easier to add new package managers
  • Consistent package collection behavior
  • Estimated effort: 3-4 hours

Issue 1.7: Validation Files Should Use Subdirectory (BEST PRACTICE)

Current State: 17 *_validation.go files mixed with other workflow files in pkg/workflow/

Best Practice Example: The expression_* files show good organization:

  • expression_parser.go - Parsing
  • expression_builder.go - Building
  • expression_extraction.go - Extraction
  • expression_nodes.go - AST nodes
  • expression_validation.go - Validation

Recommendation: Apply same pattern to validation - move to subdirectory (already covered in Issue 1.3)


Package 2: pkg/cli (89 files, 29,033 lines)

Large Command Files

File Lines Primary Issues
compile_command.go 1,474 Compilation + watching + security tools + PR creation
logs.go 1,338 Download + parsing + analysis + rendering mixed
update_command.go 1,331 Extension updates + workflow updates + PR creation
mcp_inspect.go 948 MCP inspection + display logic
trial_command.go 944 Trial execution + git operations + result collection
add_command.go 904 Adding workflows + compilation mixed

Issue 2.1: logs.go Mixed Responsibilities (HIGH PRIORITY)

File: pkg/cli/logs.go (1,338 lines)

Multiple Responsibilities:

  • Command creation and flag parsing
  • Job status fetching from GitHub API
  • Log downloading and aggregation (concurrent downloads)
  • Log parsing and analysis (engine-specific)
  • Error detection and reporting
  • MCP tool usage analysis
  • Firewall log analysis
  • Metrics calculation
  • Output formatting (JSON + console)
  • Cache management

Current Support Files: logs_parsing.go, logs_metrics.go, logs_report.go, logs_download.go, logs_cache.go, logs_models.go

Recommendation: The support files already exist but logs.go still mixes too many concerns. Further split logs.go:

pkg/cli/
├── logs_command.go          (~300 lines - command setup and orchestration)
├── logs_download.go         (exists - artifact downloading)
├── logs_parsing.go          (exists - log parsing)
├── logs_analysis.go         (~400 lines - NEW - extract analysis logic from logs.go)
├── logs_metrics.go          (exists - metrics calculation)
├── logs_report.go           (exists - report generation)
├── logs_cache.go            (exists - caching)
└── logs_models.go           (exists - data models)

Impact:

  • Complete separation of concerns for logs feature
  • logs_command.go becomes thin orchestrator
  • Each file has single, focused responsibility
  • Estimated effort: 4-6 hours

Issue 2.2: compile_command.go Multiple Tools Integration (MEDIUM PRIORITY)

File: pkg/cli/compile_command.go (1,474 lines)

Functions: Handles compilation plus integration with:

  • File watching and recompilation
  • Security scanning (zizmor, poutine, actionlint)
  • Action SHA validation
  • YAML validation
  • JSON schema generation
  • PR creation
  • Dependabot configuration

Recommendation: Extract security tool integrations:

pkg/cli/
├── compile_command.go           (~800 lines - core compilation)
├── compile_watch.go             (~200 lines - file watching)
├── compile_security.go          (~300 lines - zizmor, poutine, actionlint)
└── compile_validation.go        (~200 lines - YAML and SHA validation)

Impact:

  • Clearer responsibility boundaries
  • Security tools can be tested independently
  • Easier to add new security integrations
  • Estimated effort: 5-6 hours

Issue 2.3: Shared Flag Parsing Pattern (LOW PRIORITY - INFORMATIONAL)

Pattern: Flag parsing repeated 69 times across 12 command files:

repoSpec, _ := cmd.Flags().GetString("repo")
format, _ := cmd.Flags().GetString("format")
verbose, _ := cmd.Flags().GetBool("verbose")

Observation: This is acceptable for Cobra-based CLIs and doesn't require refactoring. Each command has unique flags and the pattern is clear and consistent.

No Action Recommended - This is idiomatic Cobra usage.


Issue 2.4: MCP Commands Well-Organized (GOOD EXAMPLE ✓)

MCP command files demonstrate excellent organization:

pkg/cli/
├── mcp.go                    (main command)
├── mcp_add.go               (add subcommand)
├── mcp_inspect.go           (inspect subcommand)
├── mcp_inspect_mcp.go       (inspect MCP-specific logic)
├── mcp_list.go              (list subcommand)
├── mcp_list_tools.go        (list tools helper)
├── mcp_server.go            (server management)
├── mcp_gateway.go           (gateway configuration)
├── mcp_registry.go          (registry operations)
├── mcp_logs_guardrail.go    (log analysis)
└── mcp_validation.go        (validation)

Best Practice: Clear subcommand structure with focused helper files. Use this pattern for other complex commands!

No Action Needed - This is exemplary organization.


Package 3: pkg/parser (14 files, 5,897 lines)

Large Files

File Lines Issues
frontmatter.go 1,283 Mixed imports, includes, extraction, merging
schema.go 1,156 Mixed validation, suggestions, compilation
mcp.go 713 MCP parsing + validation combined

Issue 3.1: frontmatter.go Multiple Concerns (HIGH PRIORITY)

File: pkg/parser/frontmatter.go (1,283 lines)

Functions: 30+ functions covering:

  • Import directive parsing: ParseImportDirective() (~34 lines)
  • Import processing: ProcessImportsFromFrontmatter() + 3 variants (~500 lines)
  • Include expansion: ExpandIncludes() + 3 variants (~100 lines)
  • Include processing: ProcessIncludes() + variants (~150 lines)
  • Field extraction: 12 extract*FromContent() functions (~200 lines)
  • Content merging: MergeTools(), include processing (~150 lines)

Recommendation: Split by functional domain:

pkg/parser/
├── frontmatter.go                (~100 lines - core ParseImportDirective + types)
├── frontmatter_imports.go        (~350 lines - all ProcessImports* functions)
├── frontmatter_includes.go       (~250 lines - ExpandIncludes* and ProcessIncludes*)
├── frontmatter_extract.go        (~200 lines - all extract*FromContent functions)
└── frontmatter_merge.go          (~150 lines - MergeTools and merging logic)

Impact:

  • Five focused files (~100-350 lines each) vs one 1,283-line file
  • Clear separation: imports vs includes vs extraction vs merging
  • Easier to test each domain independently
  • Estimated effort: 6-8 hours

Issue 3.2: schema.go Multiple Concerns (HIGH PRIORITY)

File: pkg/parser/schema.go (1,156 lines)

Functions: 30+ functions covering:

  • Schema compilation/caching: getCompiledMainWorkflowSchema() etc. (~40 lines)
  • Validation orchestration: 8 Validate* functions (~400 lines)
  • Custom rule validation: validateCommandTriggerConflicts(), validateEngineSpecificRules() (~100 lines)
  • Schema suggestion generation: generateSchemaBasedSuggestions(), navigation, examples (~200 lines)
  • Deprecated field handling: GetMainWorkflowDeprecatedFields(), FindDeprecatedFieldsInFrontmatter() (~100 lines)
  • Utility functions: LevenshteinDistance(), removeDuplicates(), min() (~100 lines)

Recommendation: Split by functional domain:

pkg/parser/
├── schema.go                     (~150 lines - public validation API + types)
├── schema_cache.go               (~100 lines - schema compilation and caching)
├── schema_validate.go            (~400 lines - validation orchestration + custom rules)
├── schema_suggestions.go         (~250 lines - error suggestions and schema navigation)
├── schema_deprecated.go          (~100 lines - deprecated field handling)
└── schema_utils.go               (~100 lines - utilities OR move to pkg/util/)

Alternative for utilities: Extract to pkg/util/:

  • LevenshteinDistance()pkg/util/strings.go
  • removeDuplicates()pkg/util/slices.go
  • min(), max()pkg/util/math.go

Impact:

  • Five focused files (~100-400 lines each) vs one 1,156-line file
  • Reusable utilities available across packages
  • Clearer separation: validation vs suggestions vs deprecated fields
  • Estimated effort: 6-8 hours

Issue 3.3: Extract Generic Utilities to pkg/util (LOW PRIORITY)

Current Location: pkg/parser/schema.go

Generic utilities that should be reusable:

func LevenshteinDistance(a, b string) int          // String algorithm
func removeDuplicates(slice []string) []string     // Slice utility
func min(a, b int) int                             // Math utility

Also found in other packages:

  • pkg/parser/ansi_strip.go: StripANSI() - could be in pkg/util/strings.go

Recommendation: Create pkg/util/ package:

pkg/util/
├── strings.go      (LevenshteinDistance, StripANSI)
├── slices.go       (removeDuplicates, generic slice helpers)
└── math.go         (min, max helpers)

Impact:

  • Reusable utilities across all packages
  • Consistent utility implementations
  • Clear location for shared helper functions
  • Estimated effort: 2-3 hours

Semantic Function Clustering Analysis

Function Naming Patterns Across Packages

Pattern pkg/workflow pkg/cli pkg/parser Purpose
build* 33 2 0 Construct structures, AST nodes
generate* 48 8 3 Create/produce output structures
parse* 12 6 5 Interpret and structure input
extract* 26 3 12 Retrieve data from structures
validate* 15 8 8 Verify correctness
render* 15 10 0 Transform to output format
convert* 10 2 0 Transform between formats
collect* 12 2 0 Gather items from sources
New* 45 12 4 Constructors

Observations:

  • Consistent naming conventions across packages
  • Clear verb-noun structure for function names
  • Domain-specific verb preferences (workflow: generate/build, cli: render, parser: parse/extract)

Validation Functions - Scattered Pattern

Total validation-related files: 25+ across repository

pkg/workflow: 17 validation files (Issue 1.3)
pkg/cli: validation functions embedded in command files
pkg/parser: validation in schema.go (Issue 3.2)

Pattern: Validation logic is distributed but could benefit from consolidation in each package.

Recommendation: Already covered in package-specific issues above.


Priority Refactoring Recommendations

Priority 1: High-Impact Quick Wins (1-2 Weeks)

Estimated Total Effort: 16-22 hours

  1. ✅ Consolidate Token Handling (pkg/workflow/safe_outputs_env_helpers.go)

    • 4 similar functions → 1 configurable function
    • Lines saved: ~80
    • Effort: 2-3 hours
    • Impact: HIGH - Single source of truth for token logic
  2. ✅ Consolidate Upload Artifact Generation (pkg/workflow/compiler_yaml.go)

    • 7 similar functions → 1 configurable helper
    • Lines saved: ~100-120
    • Effort: 1-2 hours
    • Impact: HIGH - Easier action version updates
  3. ✅ Split frontmatter.go (pkg/parser/)

    • 1,283 lines → 5 focused files (100-350 lines each)
    • Effort: 6-8 hours
    • Impact: HIGH - Clearer import vs include vs merge separation
  4. ✅ Split schema.go (pkg/parser/)

    • 1,156 lines → 5-6 focused files (100-400 lines each)
    • Effort: 6-8 hours
    • Impact: HIGH - Clearer validation vs suggestion vs deprecated separation

Priority 2: Structural Improvements (2-4 Weeks)

Estimated Total Effort: 20-28 hours

  1. ✅ Create pkg/workflow/validation/ subdirectory (Issue 1.3)

    • Move 17 validation files to subdirectory
    • Effort: 3-4 hours
    • Impact: MEDIUM - Clear module boundary for validation
  2. ✅ Create pkg/workflow/safeoutputs/ subdirectory (Issue 1.5)

    • Move 8+ safe outputs files to subdirectory
    • Effort: 3-4 hours
    • Impact: MEDIUM - Modularizes safe outputs system
  3. ✅ Split compiler_yaml.go (Issue 1.4)

    • 1,446 lines → 5 focused files (200-400 lines each)
    • Effort: 4-5 hours
    • Impact: MEDIUM - Separates YAML vs prompts vs uploads
  4. ✅ Split logs.go further (Issue 2.1)

    • Extract analysis logic to logs_analysis.go
    • Effort: 4-6 hours
    • Impact: MEDIUM - Completes logs feature separation
  5. ✅ Split compile_command.go (Issue 2.2)

    • Extract security tools to compile_security.go
    • Effort: 5-6 hours
    • Impact: MEDIUM - Clearer security tool integration

Priority 3: Code Quality Improvements (Ongoing)

Estimated Total Effort: 8-12 hours

  1. ✅ Consolidate Package Collection (Issue 1.6)

    • 3 duplicate functions → 1 generic approach
    • Lines saved: ~150
    • Effort: 3-4 hours
    • Impact: LOW - Easier to add new package managers
  2. ✅ Extract Generic Utilities (Issue 3.3)

    • Create pkg/util/ package
    • Effort: 2-3 hours
    • Impact: LOW - Reusable utilities across packages
  3. ✅ Document Best Practices

    • Document MCP command pattern as best practice
    • Document expression_* pattern as best practice
    • Effort: 2-3 hours
    • Impact: LOW - Maintains consistency for future development

Summary of Findings

Total Impact by Category

Category Files Affected Lines to Reduce Estimated Effort
Duplicate Code 5 ~410 lines 10-13 hours
File Splitting 6 large files Improve ~7,500 lines organization 30-38 hours
Modularization 25+ files Better boundaries 6-8 hours
Utilities 3-5 files Reusable helpers 2-3 hours
TOTAL 35-40 files ~410 lines removed, 7,500+ reorganized 48-62 hours

Good Examples to Maintain (✓)

These areas demonstrate excellent organization and should serve as patterns:

  1. ✓ Expression Handling (pkg/workflow/):

    • expression_parser.go, expression_builder.go, expression_extraction.go, expression_nodes.go, expression_validation.go
    • Pattern: Clear feature prefix with responsibility suffix
  2. ✓ MCP Commands (pkg/cli/):

    • mcp.go, mcp_add.go, mcp_inspect.go, mcp_list.go, etc.
    • Pattern: Main command with focused subcommand files
  3. ✓ Consistent Naming:

    • Strong verb-noun structure across all packages
    • Clear function purpose from name
  4. ✓ Logger Initialization:

    • Consistent var {name}Log = logger.New("package:feature") pattern
    • Used consistently across 60+ files

Implementation Checklist

Phase 1: Quick Wins (Weeks 1-2)

  • Consolidate token handling functions (Issue 1.1)
  • Consolidate upload artifact generation (Issue 1.2)
  • Split pkg/parser/frontmatter.go (Issue 3.1)
  • Split pkg/parser/schema.go (Issue 3.2)
  • Review and test changes

Phase 2: Structural (Weeks 3-4)

  • Create pkg/workflow/validation/ subdirectory (Issue 1.3)
  • Create pkg/workflow/safeoutputs/ subdirectory (Issue 1.5)
  • Split pkg/workflow/compiler_yaml.go (Issue 1.4)
  • Split pkg/cli/logs.go further (Issue 2.1)
  • Split pkg/cli/compile_command.go (Issue 2.2)
  • Update imports and test

Phase 3: Polish (Weeks 5-6)

  • Consolidate package collection pattern (Issue 1.6)
  • Create pkg/util/ and move generic utilities (Issue 3.3)
  • Document best practices
  • Final review and testing

Analysis Metadata

  • Analysis Date: 2025-12-12
  • Repository: githubnext/gh-aw
  • Commit: 6f53345
  • Files Analyzed: 286 non-test Go files
  • Total Lines Analyzed: 82,301 lines
  • Detection Methods:
    • Semantic code exploration (Claude Code Explore agents)
    • Function pattern matching (grep/awk analysis)
    • Manual review of largest files
    • Comparative analysis of similar functions
    • Cross-file pattern recognition

Conclusion

The codebase demonstrates strong architectural foundations with clear naming conventions and good separation of concerns at the package level. The primary opportunities for improvement are:

  1. Reducing duplication in token handling and artifact upload generation (~230 lines)
  2. Splitting oversized files to improve cognitive load (16 files >1000 lines)
  3. Modularizing related files into subdirectories (validation, safeoutputs)
  4. Maintaining excellent patterns from expression handling and MCP commands

Recommended Approach: Start with Priority 1 quick wins (duplication removal and parser splits) to build momentum, then tackle Priority 2 structural improvements incrementally to avoid disrupting ongoing development.

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions