-
Notifications
You must be signed in to change notification settings - Fork 36
Description
🔧 Semantic Function Clustering Analysis
Analysis of repository: githubnext/gh-aw
Executive Summary
This analysis examined 205 non-test Go source files across the gh-aw repository (~177k lines of code) to identify refactoring opportunities through semantic function clustering and duplicate detection. The analysis discovered significant findings including exact duplicate functions, scattered validation logic across 51 files, and several opportunities to improve code organization.
Key Findings:
- ✅ 2 exact duplicate functions identified (requiring immediate consolidation)
- ✅ Validation logic scattered across 51 different files (needs centralization)
- ✅ Helper/utility functions spread across multiple locations
- ✅ Well-organized patterns in some areas (e.g.,
create_*files) that can serve as templates - ✅ 1,111 functions in pkg/cli and 3,001 functions in pkg/workflow analyzed
Full Report Details
Function Inventory
By Package
| Package | Files Analyzed | Primary Purpose |
|---|---|---|
| pkg/workflow | 189 files | Core workflow compilation, validation, and generation logic |
| pkg/cli | 60 files | CLI command handlers and utilities |
| pkg/parser | 7 files | Frontmatter parsing, GitHub URL parsing, schema validation |
| pkg/console | 4 files | Console output formatting and rendering |
| pkg/logger | 1 file | Logging utilities |
| pkg/timeutil | 1 file | Time/duration formatting |
| pkg/constants | 1 file | Application constants |
Function Distribution
- Total Go files analyzed: 205
- Total functions cataloged: 4,112+ functions
- pkg/workflow functions: 3,001
- pkg/cli functions: 1,111
- Naming pattern clusters identified: 8 major clusters
Identified Issues
1. 🔴 Duplicate Functions (High Priority)
Issue 1.1: isMCPType() - Exact Duplicate with Different Implementations
Severity: HIGH - Different implementations could lead to inconsistent behavior
Occurrences:
-
File:
pkg/parser/frontmatter.go:73func isMCPType(typeStr string) bool { switch typeStr { case "stdio", "http": return true default: return false } }
-
File:
pkg/workflow/mcp-config.go:891func isMCPType(typeStr string) bool { switch typeStr { case "stdio", "http", "local": // ⚠️ Includes "local"! return true default: return false } }
Issue: The two implementations differ - mcp-config.go includes "local" as a valid type, while frontmatter.go does not. This inconsistency could cause bugs.
Recommendation:
- Consolidate into a single implementation in a shared location (e.g.,
pkg/parser/mcp.go) - Decide if
"local"should be valid everywhere or only in specific contexts - Update all call sites to use the consolidated function
Estimated Impact: Prevents potential bugs from inconsistent MCP type validation
Issue 1.2: formatFileSize() - Near Duplicate
Severity: MEDIUM - Duplicate implementation
Occurrences:
- File:
pkg/console/format.go:6- Exported asFormatFileSize() - File:
pkg/console/render.go:520- Private asformatFileSize()
Analysis: The render.go file has a private duplicate of the same logic. Both implement identical file size formatting (B, KB, MB, GB, TB).
Recommendation:
- Remove the private
formatFileSize()fromrender.go - Use the exported
FormatFileSize()fromformat.goinstead - This is already available in the same package, so it's a simple refactor
Estimated Impact: Reduced duplication, single source of truth for file size formatting
2. 🟡 Scattered Validation Logic (Medium Priority)
Issue: Validation Functions Across 51 Files
Finding: Validation-related functions are scattered across 51 different files in the pkg/workflow package, when there are only 8 dedicated validation files.
Dedicated Validation Files:
validation.go- Main validation logicvalidation_strict_mode.go- Strict mode validationbundler_validation.go- Bundler validationdocker_validation.go- Docker image validationnpm_validation.go- NPM package validationpip_validation.go- Python package validationstep_order_validation.go- Step ordering validationgithub_toolset_validation_error.go- GitHub toolset validation errorspermissions_validator.go- Permissions validation
Files with Validation Functions (Outside Dedicated Files):
compiler.go- Contains validation methods mixed with compilation logiccompiler_yaml.go- Has YAML validation functionsengine.go- Contains engine-specific validationagentic_engine.go- Has agentic workflow validationmcp-config.go- Contains MCP configuration validationimports.go- Has import validation logicjobs.go- Contains job validation- ...and 44 more files
Example of Outlier Validation Functions:
In pkg/workflow/compiler.go:
validateExpressionSizes()- Expression validation (lines 99-133)validateContainerImages()- Container validation (lines 135-182)validateRuntimePackages()- Runtime validation (lines 184-218)validateRepositoryFeatures()- Repository validation (lines 396-465)validateHTTPTransportSupport()- HTTP transport validation (lines 635-652)validateMaxTurnsSupport()- Max turns validation (lines 654-676)validateWebSearchSupport()- Web search validation (lines 678-end)
Recommendation:
- Phase 1: Move generic validation helpers to
validation.go - Phase 2: Create domain-specific validation files for:
expression_validation.go- Expression size and safety validationrepository_validation.go- Repository feature validationengine_validation.go- Engine capability validationmcp_validation.go- Consolidate all MCP validation logic
- Phase 3: Keep only compiler orchestration in
compiler.go
Estimated Effort: 8-12 hours
Benefits:
- Clearer separation of concerns
- Easier to test validation logic in isolation
- Reduces
compiler.gocomplexity
3. 🟡 Scattered Helper Functions (Medium Priority)
Issue: Helper/Utility Functions Spread Across Multiple Files
Finding: Helper functions are distributed across many files when they could be centralized for better discoverability.
Current State:
pkg/cli/shared_utils.go- Contains only 3 functions (PR auto-merge related)pkg/cli/frontmatter_utils.go- Contains 3 frontmatter update functionspkg/cli/repeat_utils.go- Contains 1 retry/repeat functionpkg/workflow/engine_helpers.go- Contains 14 engine-related helper functions- Many other files have scattered helper functions
Examples of Scattered Helpers:
Normalization Functions (5 different files):
pkg/workflow/resolve.go-normalizeWorkflowName()pkg/workflow/safe_outputs.go-normalizeSafeOutputIdentifier()pkg/cli/update_command.go-normalizeWhitespace()pkg/cli/resolver.go-NormalizeWorkflowFile()pkg/workflow/expressions.go-NormalizeExpressionForComparison()
Recommendation:
- Consider creating
pkg/cli/utils/package with subpackages:pkg/cli/utils/frontmatter/- All frontmatter manipulationpkg/cli/utils/retry/- Retry/repeat logicpkg/cli/utils/normalize/- All normalization functions
- For
pkg/workflow/, consider:pkg/workflow/helpers/- General workflow helpers- Keep domain-specific helpers in their domain files (e.g., engine helpers stay in engine files)
Estimated Effort: 6-8 hours
Benefits: Improved discoverability, easier reuse
4. 🟢 Opportunities for Better Organization (Low Priority)
Issue 4.1: Large Monolithic Files
Files with High Function Counts:
pkg/cli/logs.go- 36 functions (log parsing, metrics extraction, file operations)pkg/workflow/compiler.go- 20+ methods (compilation + validation + job generation)pkg/workflow/mcp-config.go- 30+ functions (MCP configuration, rendering, validation)
Recommendation: Consider splitting these files by feature:
logs.go→logs_parser.go,logs_metrics.go,logs_files.gocompiler.go→ Keep core compilation, move validation to validation filesmcp-config.go→mcp_config.go,mcp_render.go,mcp_validation.go
Estimated Effort: 4-6 hours per file
Benefits: Easier navigation, clearer responsibilities
Issue 4.2: Naming Inconsistencies in Helper Functions
Finding: Helper functions use inconsistent naming:
- Some use
ensure*prefix (e.g.,ensureMCPConfig,ensurePoutineConfig) - Some use
get*prefix for similar operations - Some use passive names without verbs
Example - Config Initialization Functions:
pkg/cli/mcp_config_file.go:27: func ensureMCPConfig(verbose bool) error
pkg/cli/poutine.go:38: func ensurePoutineConfig(gitRoot string) error
pkg/cli/actionlint.go:27: func ensureActionlintConfig(gitRoot string) errorRecommendation: Standardize on verb prefixes:
ensure*- For functions that create if not existsget*- For pure gettersvalidate*- For validation functionsparse*- For parsing functions
Estimated Effort: 2-3 hours
Benefits: Consistent API surface, easier to understand function purpose
Detailed Function Clusters
Cluster 1: Creation Functions (create_*)
Pattern: Functions and files that create GitHub entities
Well-Organized Files (✅ Good example to follow):
pkg/workflow/create_issue.go- Issue creationpkg/workflow/create_pull_request.go- PR creationpkg/workflow/create_discussion.go- Discussion creationpkg/workflow/create_pr_review_comment.go- PR review comment creationpkg/workflow/create_code_scanning_alert.go- Code scanning alert creationpkg/workflow/create_agent_task.go- Agent task creation
Analysis: ✅ Excellent organization - Each creation operation has its own file, making the codebase easy to navigate and test. This pattern should be preserved and used as a model.
Cluster 2: Parsing Functions (Parse*, Extract*)
Pattern: Functions that parse configuration or extract data
Distribution:
pkg/parser/- 7 files with 60+ parsing functions (well-organized)pkg/workflow/comment.go-ParseCommandEvents()pkg/workflow/expressions.go-ParseExpression()- Many
Extract*functions scattered across workflow package
Analysis: Parser package is well-organized, but some parsing functions exist outside of it. Most are domain-specific (e.g., comment parsing in comment.go), which is acceptable.
Recommendation: Keep current organization, but consider moving generic parsing utilities to pkg/parser/.
Cluster 3: Validation Functions (Validate*, validate*)
See Issue #2 above - Already covered in detail.
Cluster 4: Formatting Functions (Format*, format*)
Pattern: Functions that format data for display
Well-Organized:
pkg/console/console.go- 15+ formatting functions for console outputpkg/console/format.go- File size formattingpkg/console/render.go- Struct rendering and formattingpkg/timeutil/format.go- Duration formatting
Analysis: ✅ Console formatting is well-centralized in the pkg/console package. Minor duplicate (formatFileSize) should be removed (see Issue 1.2).
Cluster 5: Rendering Functions (Render*)
Pattern: Functions that render YAML/JSON configuration
Distribution:
pkg/workflow/engine_helpers.go:RenderGitHubMCPDockerConfig()RenderGitHubMCPRemoteConfig()RenderJSONMCPConfig()
pkg/workflow/expressions.go:RenderConditionAsIf()
pkg/console/package - Rendering for console output
Analysis: Rendering is split between YAML/JSON config rendering (workflow) and console rendering (console). This split makes sense.
Cluster 6: MCP Configuration Functions
Pattern: Functions related to MCP server configuration
Distribution Across Multiple Files:
pkg/workflow/mcp-config.go- Main MCP configuration logic (30+ functions)pkg/workflow/mcp_servers.go- MCP server detection and configurationpkg/parser/mcp.go- MCP configuration parsingpkg/cli/mcp_*.go- 15 CLI files for MCP management
Analysis: MCP functionality is spread but somewhat organized. The pkg/cli/mcp_*.go pattern is good for CLI commands. The workflow MCP logic could benefit from further splitting.
Cluster 7: Token/Credentials Functions
Pattern: Functions that handle GitHub tokens and credentials
Well-Organized:
pkg/workflow/github_token.go:getEffectiveGitHubToken()- Standard GitHub token resolutiongetEffectiveCopilotGitHubToken()- Copilot-specific token resolution
- Multiple getter functions in
pkg/workflow/mcp_servers.go:getGitHubToken()getGitHubDockerImageVersion()- etc.
Analysis: ✅ Token logic is reasonably well-organized in dedicated files.
Cluster 8: ensure* Configuration Functions
Pattern: Functions that ensure configuration files exist
Distribution:
pkg/cli/mcp_config_file.go:27: func ensureMCPConfig(verbose bool) error
pkg/cli/poutine.go:38: func ensurePoutineConfig(gitRoot string) error
pkg/cli/actionlint.go:27: func ensureActionlintConfig(gitRoot string) error
pkg/cli/add_command.go:643: func ensureCopilotInstructions(...)
pkg/cli/add_command.go:693: func ensureAgenticWorkflowPrompt(...)Analysis: These follow a consistent pattern of "create if not exists". Could be abstracted into a generic ensureConfigFile() helper that takes a template and target path.
Recommendation: Consider creating a config file helper utility that reduces code duplication across these functions.
Summary of Best Practices Found
✅ What's Working Well:
create_*Files Pattern - Each GitHub entity creation has its own file- Parser Package - Well-organized parsing logic in
pkg/parser/ - Console Package - Centralized formatting and rendering
- Validation Files - Dedicated validation files for specific domains (docker, npm, pip)
Refactoring Recommendations
Priority 1: High Impact (Should Do Soon)
1. Consolidate Duplicate isMCPType() Function
- Effort: 30 minutes
- Impact: Prevents bugs from inconsistent behavior
- Action:
- Move to
pkg/parser/mcp.go - Decide on correct implementation (with or without "local")
- Update imports in both files
- Move to
2. Remove Duplicate formatFileSize() from render.go
- Effort: 15 minutes
- Impact: Eliminates code duplication
- Action: Replace usage with
FormatFileSize()fromformat.go
Priority 2: Medium Impact (Worth Doing)
3. Consolidate Validation Logic
- Effort: 8-12 hours
- Impact: Clearer code organization, easier testing
- Action:
- Create
expression_validation.gofor expression validation - Create
repository_validation.gofor repository checks - Create
engine_validation.gofor engine capability checks - Move validation methods from
compiler.goto appropriate files
- Create
4. Centralize Helper Functions
- Effort: 6-8 hours
- Impact: Improved discoverability, reduced duplication
- Action:
- Create
pkg/cli/utils/normalize/for normalization functions - Consolidate frontmatter utilities
- Create retry/repeat utilities package
- Create
Priority 3: Low Impact (Nice to Have)
5. Split Large Files
- Effort: 4-6 hours per file
- Impact: Easier navigation
- Action: Split
logs.go,mcp-config.goby feature area
6. Standardize Helper Naming
- Effort: 2-3 hours
- Impact: Consistent API surface
- Action: Rename functions to follow verb-based naming convention
Implementation Checklist
Phase 1: Quick Wins (1-2 hours)
- Consolidate
isMCPType()duplicate - Remove
formatFileSize()duplicate from render.go - Document refactoring decisions in memory
Phase 2: Validation Consolidation (1-2 weeks)
- Create new validation files:
expression_validation.go,repository_validation.go,engine_validation.go - Move validation functions from
compiler.goto dedicated files - Update tests to reflect new organization
- Verify no functionality broken
Phase 3: Helper Consolidation (1 week)
- Create utility packages structure
- Move normalization functions to dedicated package
- Move retry/repeat logic to dedicated package
- Update import statements across codebase
Phase 4: Long-term Improvements (Future)
- Split large files (
logs.go,mcp-config.go) - Standardize helper function naming
- Consider generic
ensureConfigFile()helper
Analysis Metadata
- Total Go Files Analyzed: 205
- Total Functions Cataloged: 4,112+
- Function Clusters Identified: 8 major clusters
- Outliers Found: 7+ significant outliers
- Duplicates Detected: 2 exact/near duplicates
- Detection Method: Serena semantic code analysis + regex pattern analysis
- Analysis Date: 2025-11-06
- Repository: githubnext/gh-aw
- Analysis Tool: Serena MCP Server + Claude Code
Conclusion
This codebase shows strong organization patterns in many areas (especially the create_* files and parser package), but has opportunities for improvement in:
- Eliminating duplicates (2 found - should be fixed immediately)
- Consolidating scattered validation logic (51 files → 8-12 files)
- Better helper function organization
The refactoring recommendations are prioritized by impact and effort, with Phase 1 quick wins taking only 1-2 hours but providing immediate value by eliminating bugs and duplication.
Overall assessment: Good foundation with clear improvement opportunities 🎯
AI generated by Semantic Function Refactoring