Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Opportunities #7136

@github-actions

Description

@github-actions

Executive Summary

Repository: githubnext/gh-aw

Analysis Overview:

  • Total Go Files Analyzed: 319 non-test files
  • Total Functions Cataloged: 1,884 functions
  • Primary Focus: pkg/workflow package (175 files, largest package)
  • Analysis Method: Static analysis + semantic naming pattern clustering
  • Key Finding: Code organization is generally good, with some opportunities for consolidation

High-Level Assessment: The codebase follows a well-organized file-per-feature pattern. Most files have clear purposes and appropriate names. However, there are opportunities to consolidate related configuration and validation files, reduce helper file proliferation, and improve semantic grouping of related functions.


Function Inventory

Package Distribution

Package Files Primary Purpose
workflow 175 Core workflow compilation and execution
cli 114 Command-line interface
parser 20 Content and frontmatter parsing
campaign 8 Campaign orchestration
console 5 Terminal output formatting
logger 3 Logging utilities
Other 6 Utilities (tty, timeutil, testutil, styles, gitutil, constants)

Function Naming Patterns

Analysis of function prefixes reveals clear semantic clusters:

  • get* functions: 182 (accessor/getter functions)
  • extract* functions: 95 (data extraction utilities)
  • parse* functions: 92 (parsing and conversion)
  • build* functions: 74 (construction/builder functions)
  • validate* functions: 48 (validation logic)
  • format* functions: 29 (formatting/presentation)
  • create* functions: 11 (entity creation)
  • merge* functions: 13 (configuration merging)
  • compile* functions: 12 (compilation logic)
  • generate* functions: 20 (code generation)

Identified Issues

Issue 1: Safe Output Configuration File Proliferation

Severity: Medium
Impact: Increased cognitive load, difficulty finding the right file

Current State:

The safe output functionality is split across 10+ files:

safe_output_builder.go (16 functions)
safe_output_config.go (1 function)
safe_output_validation_config.go (3 functions)
safe_outputs.go (0 functions)
safe_outputs_app.go (5 functions)
safe_outputs_config.go (10 functions)
safe_outputs_env.go (8 functions)
safe_outputs_env_helpers.go (11 functions)
safe_outputs_jobs.go (1 function)
safe_outputs_steps.go (5 functions)

Analysis:

  • Three different config files: safe_output_config.go, safe_outputs_config.go, safe_output_validation_config.go
  • Naming inconsistency: safe_output_* vs safe_outputs_* (singular vs plural)
  • safe_outputs.go contains 0 functions (empty or types-only file)
  • Helper functions split between safe_outputs_env_helpers.go and builder pattern

Recommendation:

Option A - Consolidate by Purpose (Recommended):

  1. Merge safe_output_config.go and safe_outputs_config.go into single safe_outputs_config.go
  2. Keep safe_output_validation_config.go separate (validation is distinct concern)
  3. Merge safe_outputs_env_helpers.go into safe_outputs_env.go (only 19 functions total)
  4. Review if safe_outputs.go is needed or can be merged into another file

Option B - Consolidate by Layer:

  1. safe_outputs_config.go - all configuration parsing
  2. safe_outputs_builders.go - all builder functions
  3. safe_outputs_jobs.go - job generation
  4. safe_outputs_steps.go - step generation

Estimated Impact: 2-3 hours; Benefits: Clearer organization, easier to find related functions


Issue 2: Multiple Config-Related Files

Severity: Low
Impact: Minor confusion about where config parsing belongs

Current State:

6 config-related files in pkg/workflow:

config_helpers.go (13 functions) - Generic config parsing utilities
mcp-config.go (19 functions) - MCP server configuration
mcp_config_validation.go - MCP config validation
safe_output_config.go (1 function) - Safe output config parsing
safe_output_validation_config.go (3 functions) - Safe output validation config
safe_outputs_config.go (10 functions) - Safe output configuration

Analysis:

  • config_helpers.go is well-documented with clear purpose (shared parsing utilities)
  • MCP config properly separated into config and validation
  • Safe output config files overlap with Issue rejig docs #1

Recommendation:

Current organization is mostly good, but consider:

  1. Keep as-is for most files - The separation is logical
  2. Address safe output config files per Issue rejig docs #1
  3. Add file header comments to clarify purpose where missing

Estimated Impact: 1 hour; Benefits: Improved discoverability


Issue 3: Helper File Proliferation

Severity: Low-Medium
Impact: Difficulty knowing which helper file to use

Current State:

10 helper files identified:

close_entity_helpers.go (4 functions) - Entity closing helpers
compiler_test_helpers.go (3 functions) - Test-only helpers
compiler_yaml_helpers.go (4 functions) - YAML generation helpers
config_helpers.go (13 functions) - Config parsing helpers
engine_helpers.go (6 functions) - Engine-related helpers
git_helpers.go (1 function) - Git utilities
map_helpers.go (2 functions) - Map manipulation utilities
prompt_step_helper.go (2 functions) - Prompt step helpers
safe_outputs_env_helpers.go (11 functions) - Safe outputs env helpers
update_entity_helpers.go (4 functions) - Entity update helpers

Analysis:

  • Most helper files are domain-specific (good!)
  • map_helpers.go with only 2 functions might be unnecessary
  • git_helpers.go with only 1 function is underutilized
  • config_helpers.go is well-documented and serves clear purpose
  • Helper files follow "3+ callers" rule documented in codebase

Recommendation:

Mostly acceptable, with minor consolidation opportunities:

  1. Keep domain-specific helpers - They serve clear purposes (compiler, engine, safe_outputs, etc.)
  2. Consider merging small utilities:
    • Merge map_helpers.go (2 functions) into a more generic utilities file if appropriate
    • Keep git_helpers.go for future growth, or merge into gitutil package
  3. Add documentation to helper files explaining their purpose (like config_helpers.go)

Estimated Impact: 1-2 hours; Benefits: Slightly cleaner organization


Issue 4: Validation Functions in Non-Validation Files

Severity: Low
Impact: Minor inconsistency in file organization

Current State:

19 validation files exist in pkg/workflow/validation.go, but some Validate* functions exist outside:

action_sha_checker.go: ValidateActionSHAsInLockFile()
github_tool_to_toolset.go: ValidateGitHubToolsAgainstToolsets()
imports.go: (validation functions)
jobs.go: (validation functions)
permissions_validator.go: ValidatePermissions()

Analysis:

  • Most validation is properly organized into validation files
  • Outlier validation functions are domain-specific and co-located with their domain
  • permissions_validator.go is essentially a validation file (despite name)
  • action_sha_checker.go contains validation as part of SHA checking functionality

Reasoning for Current Organization:

  • These functions validate domain-specific concepts (actions, permissions, etc.)
  • Moving them to generic validation files would separate them from related logic
  • Current co-location with domain logic is actually beneficial

Recommendation:

No changes needed - Current organization is appropriate. The validation functions are:

  • Domain-specific
  • Co-located with related functionality
  • Not generic validation utilities

This is an acceptable pattern where validation is part of a larger feature.

Estimated Impact: None; Benefits: None (current organization is good)


Issue 5: Compiler File Organization

Severity: Low
Impact: Good organization, minor inconsistencies

Current State:

15 compiler-related files:

compiler.go (2 functions) - Main compiler entry points
compiler_activation_jobs.go (4 functions) - Activation job generation
compiler_filters_validation.go (2 functions) - Filter validation
compiler_jobs.go (10 functions) - Job compilation
compiler_parse.go (2 functions) - Parsing logic
compiler_safe_output_jobs.go (1 function) - Safe output job generation
compiler_safe_outputs.go (6 functions) - Safe outputs compilation
compiler_safe_outputs_consolidated.go (30 functions) - Consolidated safe outputs
compiler_test_helpers.go (3 functions) - Test helpers
compiler_types.go (22 functions) - Type definitions
compiler_yaml.go - YAML generation
compiler_yaml_ai_execution.go - AI execution YAML
compiler_yaml_artifacts.go - Artifacts YAML
compiler_yaml_helpers.go (4 functions) - YAML helpers
compiler_yaml_main_job.go - Main job YAML

Analysis:

  • Good separation by concern (jobs, YAML, types, etc.)
  • compiler_safe_outputs_consolidated.go has 30 functions - largest compiler file
  • Clear naming pattern: compiler_[feature].go

Observation:
The file compiler_safe_outputs_consolidated.go suggests someone already identified consolidation as valuable. With 30 functions, this might benefit from further splitting.

Recommendation:

Option A - Further Split Large File:
Split compiler_safe_outputs_consolidated.go (30 functions) into:

  • compiler_safe_outputs_config.go - Configuration parsing
  • compiler_safe_outputs_builders.go - Builder functions
  • compiler_safe_outputs_validators.go - Validation logic

Option B - Keep As-Is (Recommended):
The current organization works well. The "consolidated" file suggests intentional grouping.

Estimated Impact: 3-4 hours if splitting; Benefits: More granular organization (marginal)


Issue 6: Parser Functions Distribution

Severity: Low
Impact: Good organization overall

Current State:

Parse functions are distributed across:

pkg/parser/ (20 files) - Primary parsing logic
pkg/workflow/compiler_parse.go (2 functions) - Compiler-specific parsing
pkg/workflow/tools_parser.go (15 functions) - Tools configuration parsing
pkg/workflow/expression_parser.go (13 functions) - Expression parsing

Analysis:

  • Clear separation: generic parsing in pkg/parser, domain-specific in pkg/workflow
  • tools_parser.go (15 functions) is focused on tools configuration
  • expression_parser.go (13 functions) is focused on expression parsing
  • Both are substantial enough to warrant separate files

Recommendation:

No changes needed - Current organization is excellent:

  • Generic parsing utilities in pkg/parser
  • Domain-specific parsing in pkg/workflow
  • Each parser file has a clear, focused purpose

Estimated Impact: None; Benefits: None (already well-organized)


Issue 7: Large Files by Function Count

Severity: Low
Impact: Potential maintainability concerns for largest files

Top 10 Largest Files:

File Functions Assessment
js.go 41 JavaScript bundling/execution - complex domain
scripts.go 37 Script generation - reasonable for domain
permissions.go 37 Permission handling - comprehensive
compiler_safe_outputs_consolidated.go 30 Already consolidated
agentic_engine.go 30 Engine implementation - appropriate
expression_builder.go 27 Expression DSL - reasonable
frontmatter_extraction.go 26 Extraction logic - focused
safe_inputs.go 22 Input handling - appropriate
compiler_types.go 22 Type definitions - reasonable
mcp_servers.go 21 MCP server handling - appropriate

Analysis:

  • Largest files (40+ functions) handle complex domains (JS bundling, scripts, permissions)
  • Most files in 20-30 function range are appropriately sized
  • File sizes correlate with domain complexity, not poor organization

Recommendation:

No immediate action needed - File sizes are justified by domain complexity. Consider future refactoring if any file exceeds 50 functions.

Estimated Impact: None; Benefits: None (current sizing is appropriate)


Detailed Function Clusters

Cluster 1: Creation Functions (create*)

Pattern: create* functions
Count: 11 functions
Primary Location: pkg/cli, pkg/campaign

Examples:

  • CreateSpecSkeleton() - Campaign spec creation
  • CreateWorkflowInteractively() - Interactive workflow creation
  • createAndSwitchBranch() - Git branch creation
  • createForkIfNeeded() - Fork creation
  • createPR() - Pull request creation

Analysis: ✅ Well-organized - creation functions are appropriately distributed by domain


Cluster 2: Building Functions (build*, Build*)

Pattern: build* and Build* functions
Count: 74 functions
Primary Location: pkg/workflow

Examples:

  • BuildActionEquals() - Condition builder
  • buildArtifactDownloadSteps() - Step builder
  • buildCampaignSummaries() - Summary builder
  • BuildOrchestrator() - Orchestrator builder

Analysis: ✅ Large cluster reflects extensive builder pattern usage - appropriate for workflow compilation


Cluster 3: Parsing Functions (parse*, Parse*)

Pattern: parse* and Parse* functions
Count: 92 functions
Distribution: pkg/parser (primary), pkg/workflow (domain-specific)

Examples:

  • ParseGitHubURL() - URL parsing
  • ParseImportDirective() - Import parsing
  • ParseInputDefinition() - Input parsing
  • parseTimeDelta() - Time parsing

Analysis: ✅ Good separation between generic (pkg/parser) and domain-specific (pkg/workflow) parsing


Cluster 4: Validation Functions (validate*, Validate*)

Pattern: validate* and Validate* functions
Count: 48 functions
Primary Location: pkg/workflow/validation.go files (19 files)

Examples:

  • ValidatePermissions() - Permission validation
  • ValidateSpec() - Spec validation
  • ValidateEventFilters() - Filter validation
  • ValidateMCPConfigs() - MCP config validation

Analysis: ✅ Mostly well-organized into validation files, with acceptable outliers (see Issue #4)


Cluster 5: Extraction Functions (extract*, Extract*)

Pattern: extract* and Extract* functions
Count: 95 functions
Primary Location: pkg/parser, pkg/workflow

Examples:

  • ExtractFrontmatterFromContent() - Frontmatter extraction
  • ExtractMarkdownContent() - Markdown extraction
  • extractStringFromMap() - Map value extraction
  • ExtractActionsFromLockFile() - Action extraction

Analysis: ✅ Large cluster reflects significant parsing/extraction work - appropriate distribution


Cluster 6: Format Functions (format*, Format*)

Pattern: format* and Format* functions
Count: 29 functions
Primary Location: pkg/console (output formatting), pkg/workflow (data formatting)

Examples:

  • FormatBanner() - Console banner
  • FormatDuration() - Duration formatting
  • FormatErrorMessage() - Error formatting
  • formatSafeOutputsRunsOn() - Config formatting

Analysis: ✅ Good separation: console output (pkg/console) vs data formatting (pkg/workflow)


Cluster 7: Generator Functions (generate*, Generate*)

Pattern: generate* and Generate* functions
Count: 20 functions
Primary Location: pkg/workflow

Examples:

  • GenerateRuntimeSetupSteps() - Runtime setup
  • GenerateActionMetadataCommand() - Metadata generation
  • GenerateMaintenanceWorkflow() - Workflow generation
  • GenerateMCPGatewaySteps() - Gateway steps

Analysis: ✅ Focused on workflow generation - appropriate clustering


Cluster 8: Merge Functions (merge*, Merge*)

Pattern: merge* and Merge* functions
Count: 13 functions
Primary Location: pkg/workflow (configuration merging)

Examples:

  • MergeTools() - Tool configuration merging
  • MergeWorkflowContent() - Workflow merging
  • mergeRuntimes() - Runtime merging
  • mergeMCPTools() - MCP tool merging

Analysis: ✅ Configuration merging utilities - appropriately grouped


No Duplicate Functions Found

Important Finding: The analysis did not identify any true duplicate functions (functions with identical or near-identical implementations).

What was checked:

  • Functions with similar names across files
  • Common utility patterns (sanitize, normalize, etc.)
  • Parsing functions across packages

What was found:

  • Similar function names serve different purposes
  • Apparent "duplicates" are domain-specific variants
  • No copy-pasted implementations detected

Example:

  • ParseGitHubURL() appears to be defined once in pkg/parser/github_urls.go
  • Other parse functions have distinct names and purposes

This is a positive finding - the codebase avoids code duplication effectively.


Refactoring Recommendations

Priority 1: High Value, Low Effort

1.1 Consolidate Safe Output Config Files

Files to merge:

  • safe_output_config.go → merge into safe_outputs_config.go
  • Consider merging safe_outputs_env_helpers.go → into safe_outputs_env.go

Benefits:

  • Reduced cognitive load
  • Consistent naming (use plural: safe_outputs_*)
  • Easier to find configuration parsing logic

Estimated Effort: 2-3 hours
Risk: Low (straightforward file merge)


Priority 2: Medium Value, Low Effort

2.1 Add Documentation to Helper Files

Action: Add file header comments (like config_helpers.go) to:

  • map_helpers.go
  • git_helpers.go
  • engine_helpers.go
  • Other helper files without clear documentation

Benefits:

  • Clearer purpose for each helper file
  • Better onboarding for new developers
  • Justification for helper file existence

Estimated Effort: 1-2 hours
Risk: None (documentation only)


2.2 Review Small Helper Files

Files to review:

  • map_helpers.go (2 functions) - Consider merging or keeping for future growth
  • git_helpers.go (1 function) - Consider moving to pkg/gitutil or keeping for growth

Benefits:

  • Slightly cleaner file organization
  • Reduced number of small files

Estimated Effort: 1 hour
Risk: Low (small files, easy to merge or keep)


Priority 3: Optional Long-term Improvements

3.1 Consider Splitting Large Files

If any file grows beyond 50 functions, consider splitting:

  • Current largest: js.go (41 functions) - approaching threshold
  • scripts.go (37 functions) - manageable for now
  • permissions.go (37 functions) - appropriate for domain complexity

Benefits:

  • Improved maintainability for very large files
  • More focused file purposes

Estimated Effort: 3-4 hours per file
Risk: Medium (refactoring larger files requires care)

Recommendation: Monitor but don't split yet - current sizes are acceptable


Implementation Checklist

If proceeding with Priority 1 recommendations:

  • Review safe output config file contents and dependencies
  • Create backup branch for refactoring work
  • Merge safe_output_config.go into safe_outputs_config.go
  • Update import statements across codebase
  • Review if safe_outputs.go (0 functions) should be kept or merged
  • Consider merging safe_outputs_env_helpers.go into safe_outputs_env.go
  • Run all tests to verify no breakage
  • Update any documentation referencing old file names

If proceeding with Priority 2 recommendations:

  • Add file header documentation to helper files
  • Document purpose, usage patterns, and "why this file exists"
  • Review 2-function and 1-function helper files
  • Decide: keep for future growth or merge into related files

Analysis Metadata

  • Analysis Date: 2025-12-21
  • Analysis Method: Static analysis using grep, find, and pattern matching
  • Detection Approach: Function name clustering + semantic grouping
  • Scope: All .go files in pkg/ directory (excluding tests)
  • Tool Support: Planned to use Serena MCP but not required for this analysis level

Positive Findings

The analysis reveals several strengths of the current codebase:

Well-organized file-per-feature pattern - Most files have clear, focused purposes

No code duplication detected - No duplicate function implementations found

Appropriate file sizes - Large files correlate with domain complexity, not poor organization

Good separation of concerns - Generic utilities (pkg/parser) vs domain-specific (pkg/workflow)

Extensive validation coverage - 19 validation files + 48 validation functions

Clear naming conventions - Function prefixes indicate purpose (build, parse, validate, etc.)

Helper files follow documented conventions - config_helpers.go has excellent documentation


Conclusion

Overall Assessment: The codebase demonstrates good organization with clear separation of concerns. The main opportunities are:

  1. Minor consolidation of safe output config files (Priority 1)
  2. Documentation improvements for helper files (Priority 2)
  3. Monitoring of large files as codebase grows (Priority 3)

The refactoring suggestions are optional improvements rather than critical issues. The current organization is maintainable and follows good practices.

Recommended Action: Implement Priority 1 (safe output config consolidation) if the team finds value in reducing the number of config files. Priority 2 and 3 are optional enhancements.

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions