-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Semantic analysis of 490 Go source files in repository: githubnext/gh-aw
Executive Summary
A comprehensive semantic function clustering analysis has identified significant code organization patterns, outlier functions in wrong files, and duplicate implementations across the codebase. The analysis covered 490 non-test Go files across key packages, with deep focus on:
- pkg/workflow (250 files) - Workflow compilation and processing
- pkg/cli (163 files) - Command-line interface
- pkg/parser (30 files) - Parsing utilities
- pkg/campaign (14 files) - Campaign orchestration
Key Findings:
- ✅ 60-70% well-organized - Many files follow single-responsibility principles
⚠️ ~25% mixed concerns - Files mixing 2-3 different responsibilities- ❌ ~10% major issues - Files with significant organizational problems
- 🔄 2 exact duplicates found - Functions with identical implementations
- 📦 Multiple scattered utilities - Helper functions spread across files
Critical Issues Identified
1. Exact Duplicate Functions
Issue #1: extractBaseRepo() - Identical Implementation in Two Files
Duplicate Locations:
pkg/workflow/action_resolver.go:93pkg/cli/update_actions.go:20
Code Comparison:
// pkg/workflow/action_resolver.go:93
func extractBaseRepo(repo string) string {
parts := strings.Split(repo, "/")
if len(parts) >= 2 {
// Take first two parts (owner/repo)
return parts[0] + "/" + parts[1]
}
return repo
}
// pkg/cli/update_actions.go:20
func extractBaseRepo(actionPath string) string {
parts := strings.Split(actionPath, "/")
if len(parts) >= 2 {
// Return owner/repo (first two segments)
return parts[0] + "/" + parts[1]
}
// If less than 2 parts, return as-is (shouldn't happen in practice)
return actionPath
}Similarity: 100% identical logic, only comments differ
Recommendation:
- Consolidate into
pkg/repoutil/repoutil.go(which already has related utilities) - Export as
ExtractBaseRepo(path string) string - Update both callers to use
repoutil.ExtractBaseRepo()
Estimated Impact: Reduced code duplication, single source of truth for repository path parsing
Issue #2: ParseGitHubURL() - Similar Functions with Different Implementations
Duplicate Locations:
pkg/repoutil/repoutil.go:28- Returns(owner, repo string, err error)pkg/parser/github_urls.go:56- Returns(*GitHubURLComponents, error)
Implementations:
- repoutil version: Handles SSH (
git@github.com:) and HTTPS formats, returns simple owner/repo tuple - parser version: Uses
url.Parse(), handles raw.githubusercontent.com, returns structuredGitHubURLComponentswith file paths, refs, etc.
Analysis:
These functions have different purposes:
repoutil.ParseGitHubURL- Simple owner/repo extraction for git operationsparser.ParseGitHubURL- Comprehensive URL parsing with file paths, refs, and content types
Recommendation:
- Rename for clarity:
repoutil.ParseGitHubURL→repoutil.ParseGitRepoURL(emphasizes git repo focus)- Keep
parser.ParseGitHubURLas-is (comprehensive parser)
- Add cross-reference comments explaining the distinction
- Consider consolidation: Could
repoutilcallparserversion and extract owner/repo?
Estimated Impact: Improved API clarity, reduced naming confusion
2. Outlier Functions (Functions in Wrong Files)
Priority 1: High-Impact Misplacements
Outlier #1: Git Attribute Configuration in Git Operations File
File: pkg/cli/git.go:157
Function: ensureGitAttributes()
Current Purpose: Configuring .gitattributes for compiled workflow files
Issue: This is compilation post-processing, not a core git operation
Used By:
compile_helpers.gocompile_orchestration.goadd_command.go
Why It's Misplaced:
The function sets up .gitattributes to handle .md and .yml files in the .github/workflows directory. This is a workflow compilation concern, not a generic git utility. The git.go file should contain reusable git operations (commits, branches, remotes), not workflow-specific configuration.
Recommendation:
- Move to
pkg/cli/compile_post_processing.go(or create it) - Rename to
configureWorkflowGitAttributes()for clarity - Keep git.go focused on reusable git operations
Estimated Impact: Clearer separation of concerns, easier to find compilation-related setup
Outlier #2: User Interaction in Git Operations File
File: pkg/cli/git.go:550
Function: confirmPushOperation()
Issue: User interaction logic mixed with git operations
Why It's Misplaced:
This function uses huh library to prompt users for confirmation. User interaction should be grouped with other interactive prompts, not embedded in git operations.
Recommendation:
- Move to
pkg/cli/interactive.go(if exists) or createpkg/cli/prompts.go - Group with other user confirmation functions
- Keep git.go focused on actual git commands
Estimated Impact: Improved testability (can mock prompts separately from git operations)
Outlier #3: Validation in Compilation Config File
File: pkg/cli/compile_validation.go
Function: validateCompileConfig()
Issue: Validates CLI flags (dependabot, purge, workflowDir), not workflow content
Why It's Misplaced:
The file name suggests it validates workflow compilation, but this function validates CLI configuration flags. These are different concerns:
- Workflow validation: Checking workflow YAML structure, frontmatter, safe outputs
- Config validation: Checking CLI flag combinations and values
Recommendation:
- Move to
pkg/cli/compile_config.gonear theCompileConfigstruct definition - Keep
compile_validation.gofor actual workflow content validation - Consider renaming to
validateCompileFlags()for clarity
Estimated Impact: Clearer distinction between config validation and workflow validation
Outlier #4: Generic YAML Validation in Command File
File: pkg/cli/actions_build_command.go:22
Function: validateActionYml()
Issue: Generic YAML validation in a command-specific file
Why It's Misplaced:
This appears to be a reusable validation utility, but it's buried in the actions_build_command.go file. Generic validators should be centralized for reuse across commands.
Recommendation:
- Move to
pkg/cli/validators.go(which already hasValidateWorkflowName()andValidateWorkflowIntent()) - Alternative: Create
pkg/cli/actions_validators.goif it's truly actions-specific - Export if other commands might need it
Estimated Impact: Improved discoverability and reusability
3. Files with Mixed Concerns
Priority 1: Largest Impact Files
Mixed Concern #1: pkg/cli/git.go (756 lines, 27 functions)
Concerns Mixed:
- Git repository detection (
findGitRoot,isGitRepo) - GitHub URL parsing (
parseGitHubRepoSlugFromURL) - Git attributes configuration (
ensureGitAttributes) - MISPLACED - Branch operations (
getCurrentBranch,checkOnDefaultBranch) - Remote operations (
hasRemote,pushToRemote) - Commit operations (
commitChanges,hasChangesToCommit) - User interaction (
confirmPushOperation) - MISPLACED
Refactoring Recommendation:
Split into focused files:
git_repository.go- Repository detection and navigation (findGitRoot, isGitRepo)git_operations.go- Commits, branches, remotes (core git commands)git_github.go- GitHub-specific URL parsing- Move
ensureGitAttributestocompile_post_processing.go - Move
confirmPushOperationtointeractive.goorprompts.go
Estimated Effort: 4-6 hours
Benefits: Improved maintainability, easier testing, clearer boundaries
Mixed Concern #2: pkg/workflow/compiler_safe_outputs.go (150+ lines)
Concerns Mixed:
- Parsing
on:section for triggers - Reaction handling configuration
- Lock-for-agent configuration
- Event filtering logic
Problem Function:
parseOnSection() handles 4+ different configuration aspects in a single 130-line function.
Refactoring Recommendation:
- Extract to separate parsers:
parseOnTriggers()- Trigger parsingparseReactionConfig()- Reaction handlingparseLockForAgent()- Lock configurationparseEventFilters()- Event filtering
- Keep
parseOnSection()as orchestrator calling these focused parsers - Consider moving to
safe_outputs_on_parser.go
Estimated Effort: 3-4 hours
Benefits: Easier to test individual parsing concerns, reduced complexity
Mixed Concern #3: pkg/workflow/domains.go (570 lines, 20+ functions)
Concerns Mixed:
- Ecosystem domain lookups (
getEcosystemDomains()) - Runtime domain extraction (
getDomainsFromRuntimes()) - Engine-specific domain builders (
GetCopilotAllowedDomains(),GetCodexAllowedDomains(),GetClaudeAllowedDomains()) - HTTP MCP domain extraction (
extractHTTPMCPDomains()) - Domain merging functions (multiple variants)
Refactoring Recommendation:
Split into focused files:
domains_ecosystem.go- Ecosystem domain lookups (npm, pip, docker)domains_engines.go- Engine-specific domain configuration (Copilot, Codex, Claude)domains_merging.go- Domain combination and deduplication logicdomains_mcp.go- HTTP MCP domain extraction
Estimated Effort: 4-5 hours
Benefits: Clearer organization, easier to find engine-specific logic
Mixed Concern #4: pkg/workflow/compiler_jobs.go
Concerns Mixed:
- Logic/decision functions (
isActivationJobNeeded(),shouldAddCheckoutStep()) - Analysis functions (
referencesCustomJobOutputs()) - Dependency checking (
jobDependsOnPreActivation()) - Building functions (
buildJobs()) - Parsing functions (
extractJobsFromFrontmatter()) - MISPLACED
Refactoring Recommendation:
- Move
extractJobsFromFrontmatter()tocompiler_frontmatter.goorjobs_parser.go - Keep analysis functions (
referencesCustomJobOutputs,jobDependsOnPreActivation) injobs_analysis.go - Keep building functions in
jobs_builder.go - Keep decision logic in
jobs_config.go
Estimated Effort: 3-4 hours
Benefits: Separation of parsing, analysis, and construction
Mixed Concern #5: pkg/cli/add_command.go (609 lines, 5 functions)
Issue:
Mixes command definition with complex workflow addition logic. The function addWorkflowWithTracking() is 260+ lines and handles the entire workflow addition flow.
Refactoring Recommendation:
- Keep command setup in
add_command.go(flag parsing, command registration) - Extract
addWorkflowWithTracking()logic toadd_workflow_operations.go - Split into smaller functions:
prepareWorkflowAddition()- Validation and preparationexecuteWorkflowAddition()- Core addition logicfinalizeWorkflowAddition()- Post-processing and cleanup
Estimated Effort: 4-6 hours
Benefits: Clearer command structure, easier to test business logic separately
4. Scattered Helper Functions
Utility Functions Spread Across Multiple Files
Scattered #1: YAML/Frontmatter Utilities
Current Locations:
pkg/cli/codemod_yaml_utils.go:reconstructContent(),parseFrontmatterLines(),getIndentation(),isTopLevelKey(),removeFieldFromBlock()pkg/cli/frontmatter_editor.go:UpdateFieldInFrontmatter(),RemoveFieldFromOnTrigger(),SetFieldInOnTrigger()pkg/cli/add_workflow_compilation.go:addSourceToWorkflow()
Issue:
Frontmatter manipulation utilities are split across 3 files with overlapping concerns.
Recommendation:
Consolidate into unified frontmatter_utils.go:
Section 1: Parsing
parseFrontmatterLines()extractFrontmatter()
Section 2: Field Manipulation
UpdateFieldInFrontmatter()RemoveFieldFromOnTrigger()SetFieldInOnTrigger()addSourceToWorkflow()
Section 3: Content Reconstruction
reconstructContent()getIndentation()isTopLevelKey()removeFieldFromBlock()
Estimated Effort: 2-3 hours
Benefits: Single source for frontmatter operations, easier discoverability
Scattered #2: String Processing Utilities
Current Locations:
pkg/workflow/strings.go- Main string utilitiespkg/workflow/workflow_name.go-SanitizeIdentifier()(creates safe identifiers)pkg/workflow/config_helpers.go- Various string parsing helpers
Issue:
The SanitizeIdentifier() function is separate from other string utilities, even though strings.go mentions it in comments.
Recommendation:
- Keep specialized functions in their current locations (they're fine)
- Add clear cross-references in comments:
- In
strings.go: "See also: SanitizeIdentifier() in workflow_name.go for identifier creation" - In
workflow_name.go: "For general string normalization, see strings.go"
- In
Estimated Effort: 30 minutes
Benefits: Improved discoverability through better documentation
Scattered #3: Config Parsing Helpers
Current Locations:
pkg/workflow/config_helpers.go- 12 functions (parse*, Parse*, extract*)pkg/workflow/map_helpers.go- 2 functions (parseIntValue,filterMapKeys)pkg/workflow/tools_parser.go- Tool-specific parsingpkg/workflow/safe_outputs_config.go- Safe outputs parsing
Issue:
Config parsing utilities spread across 5+ files, making it hard to find the right helper.
Recommendation:
- Keep domain-specific parsers in their domain files (tools_parser.go, safe_outputs_config.go)
- Consolidate generic helpers:
- Move
parseIntValuefrommap_helpers.gotoconfig_helpers.go - Rename
map_helpers.gotomap_utils.goif it only has map manipulation (no parsing) - Add header comments in
config_helpers.goexplaining what types of helpers belong there
- Move
Estimated Effort: 1-2 hours
Benefits: Centralized generic parsing utilities, clearer file purposes
Well-Organized Code (Models to Follow)
Excellent Examples of Code Organization
🏆 Best Practice #1: codemod_* Files (pkg/cli)
Why It's Excellent:
- 13 feature-specific files following identical patterns
- Shared utilities properly factored out (
codemod_yaml_utils.go) - Consistent structure:
func get{Feature}Codemod() Codemod { return Codemod{ ID: "feature-identifier", Name: "Human readable name", Description: "What it does", IntroducedIn: "0.x.0", Apply: func(content string, frontmatter map[string]any) (string, bool, error) { // Implementation }, } }
- Each file handles ONE migration concern
- Paired with test files
Files:
codemod_agent_session.go,codemod_discussion_flag.go,codemod_grep_tool.go,codemod_mcp_mode_to_type.go,codemod_mcp_network.go,codemod_network_firewall.go,codemod_permissions.go,codemod_safe_inputs.go,codemod_sandbox_agent.go,codemod_schedule.go,codemod_schema_file.go,codemod_slash_command.go,codemod_timeout_minutes.go,codemod_upload_assets.go
Key Takeaway: This is a model subsystem that demonstrates perfect feature-based organization.
🏆 Best Practice #2: Runtime Domain Organization (pkg/workflow)
Why It's Excellent:
- Clear separation by concern:
runtime_definitions.go- Type definitionsruntime_detection.go- Runtime detection logicruntime_deduplication.go- Deduplicationruntime_validation.go- Validation
- Each file has a single, clear purpose
- Easy to find related functionality
Key Takeaway: Split by functional concern (definitions, detection, validation) rather than mixing in one large file.
🏆 Best Practice #3: Validation Files (pkg/workflow)
Why It's Excellent:
- 17+ focused validation files, each handling one domain:
pip_validation.go- Python package validationnpm_validation.go- NPM package validationdocker_validation.go- Docker image validationfirewall_validation.go- Firewall configurationexpression_validation.go- Expression safetysandbox_validation.go- Sandbox configuration- And 11 more...
- Generic validators in
validation_helpers.go(ValidateRequired(),ValidateMaxLength()) - Easy to add new validators (just create new
{domain}_validation.go)
Key Takeaway: One validation file per domain prevents god files.
🏆 Best Practice #4: Expression Handling (pkg/workflow)
Why It's Excellent:
- Well-separated by concern:
expression_parser.go- Parsingexpression_validation.go- Validationexpression_extraction.go- Extractionexpression_builder.go- Buildingexpression_patterns.go- Pattern matching
- Each file focuses on one operation on expressions
- Easy to navigate (parser → validator → builder flow)
Key Takeaway: Organize by operation type when dealing with a core concept.
🏆 Best Practice #5: frontmatter_editor.go (pkg/cli)
Why It's Excellent:
- Clear header documentation explaining the file's purpose and scope
- Functions:
UpdateFieldInFrontmatter(),RemoveFieldFromOnTrigger(),SetFieldInOnTrigger() - Single responsibility: YAML frontmatter manipulation
- Well-tested with comprehensive test file
Key Takeaway: Header comments explaining organization rationale help maintainers understand boundaries.
Detailed Function Clusters
Semantic Clustering Analysis by Pattern
Cluster 1: Creation Functions (create*)
Pattern: Functions that create new entities
Files: 13+ files in pkg/workflow
Well-Organized Examples:
create_issue.go- GitHub issue creationcreate_pull_request.go- Pull request creationcreate_discussion.go- Discussion creation
Analysis: ✅ Excellent organization - each creation function has its own file
Cluster 2: Parsing Functions (parse*, Parse*)
Pattern: Functions that parse strings, YAML, or configurations
Files: 22+ files across pkg/workflow and pkg/cli
Well-Organized Examples:
trigger_parser.go- 16 functions for trigger parsingtools_parser.go- 13 functions for tool configuration parsingslash_command_parser.go- Slash command parsing
Scattered Examples:
config_helpers.go- 9 parsing functions mixed with other helperssafe_outputs_config.go- 1 massive 150-line parsing function (should be split)compiler_jobs.go-extractJobsFromFrontmatter()(should move to parser file)
Recommendation:
- Keep domain-specific parsers in dedicated
*_parser.gofiles - Consolidate generic parsing helpers into
config_helpers.goorparsing_utils.go
Cluster 3: Validation Functions (validate*, Validate*)
Pattern: Functions that validate configurations, inputs, or workflows
Files: 21+ validation files
Well-Organized Examples:
pip_validation.go,npm_validation.go,docker_validation.go- Domain-specific validatorsvalidation_helpers.go- Generic validators (ValidateRequired(),ValidateMaxLength())strict_mode_validation.go- 7 expression safety validators
Analysis: ✅ Excellent organization - validation is well-separated by domain
Cluster 4: Building/Generation Functions (build*, generate*, Render*)
Pattern: Functions that construct objects, generate output, or render templates
Files: 20+ files
Examples:
expression_builder.go- 26 functions for building expression treesmcp_renderer.go- 14 functions for rendering MCP configurationssafe_inputs_generator.go- Generating safe input configurations
Analysis: ✅ Well-organized, clear separation of building concerns
Cluster 5: Helper Functions (lowercase patterns)
Common Patterns: ensure*, get*, is*, has*, check*, find*, extract*
Occurrences: 574+ functions across 154 files
Well-Consolidated Examples:
strings.go- String normalization utilitiesgit_helpers.go- Git repository utilitieserror_helpers.go- Error construction helpersvalidation_helpers.go- Generic validators
Scattered Examples:
- String processing helpers in 3+ files
- Config parsing helpers in 5+ files
- YAML manipulation in 3+ files
Recommendation: Continue consolidating helpers into focused utility files
Implementation Priorities
Priority 1: High-Impact, Quick Wins (1-2 days)
-
Consolidate Exact Duplicates
- Merge
extractBaseRepo()intopkg/repoutil/repoutil.go - Rename
ParseGitHubURLvariants for clarity - Effort: 2-3 hours
- Impact: Immediate code deduplication
- Merge
-
Move Outlier Functions
- Move
ensureGitAttributes()tocompile_post_processing.go - Move
confirmPushOperation()tointeractive.go - Move
validateCompileConfig()tocompile_config.go - Move
validateActionYml()tovalidators.go - Effort: 3-4 hours
- Impact: Clearer file boundaries, improved discoverability
- Move
-
Split
pkg/cli/git.go- Create
git_repository.go,git_operations.go,git_github.go - Effort: 4-6 hours
- Impact: Major maintainability improvement
- Create
Priority 2: Medium Impact (3-5 days)
-
Split Large Mixed-Concern Files
pkg/workflow/domains.go→ 3-4 focused filespkg/workflow/compiler_safe_outputs.go→ Extract parsing functionspkg/workflow/compiler_jobs.go→ Separate parsing, analysis, building- Effort: 8-12 hours
- Impact: Reduced complexity, easier testing
-
Consolidate Scattered Utilities
- Centralize YAML/frontmatter utilities
- Group config parsing helpers
- Effort: 4-6 hours
- Impact: Improved discoverability, reduced scatter
Priority 3: Long-Term Improvements (1-2 weeks)
-
Extract Complex Functions
- Refactor
parseOnSection()incompiler_safe_outputs.go - Refactor
extractSafeOutputsConfig()insafe_outputs_config.go - Effort: 6-8 hours
- Impact: Reduced function complexity, better testability
- Refactor
-
Documentation Improvements
- Add header comments explaining file organization (like
frontmatter_editor.go) - Add cross-references for related utilities
- Effort: 2-3 hours
- Impact: Improved maintainability
- Add header comments explaining file organization (like
Implementation Guidelines
For All Refactorings:
- ✅ Preserve Behavior - Ensure existing functionality works identically
- ✅ Maintain Exports - Keep public API unchanged (unless renaming for clarity)
- ✅ Write Tests First - Add tests before refactoring (especially for untested code)
- ✅ Incremental Changes - Split one file or move one function at a time
- ✅ Run Tests Frequently - Verify tests pass after each change
- ✅ Update Imports - Ensure all import paths are updated
- ✅ Add Documentation - Explain boundaries with header comments
Testing Strategy:
- Use table-driven tests for validation/parsing logic
- Mock external dependencies (git commands, GitHub API)
- Aim for ≥80% coverage for refactored code
- Verify integration tests still pass
Metrics Summary
- Total Go Files Analyzed: 490 non-test files
- Major Packages:
- pkg/workflow: 250 files
- pkg/cli: 163 files
- pkg/parser: 30 files
- pkg/campaign: 14 files
- Function Clusters Identified: 5 major clusters (creation, parsing, validation, building, helpers)
- Exact Duplicates Detected: 2 functions (
extractBaseRepo,ParseGitHubURLvariants) - Outliers Found: 4 high-priority functions in wrong files
- Files with Mixed Concerns: 5 files requiring refactoring
- Well-Organized Subsystems: 5 exemplary patterns (codemod_, validation_, runtime_, expression_)
- Detection Method: Semantic code analysis + naming pattern analysis + manual code inspection
- Analysis Date: 2026-01-29
Acceptance Criteria
Refactoring is successful when:
- Exact duplicate functions are consolidated
- Outlier functions are moved to appropriate files
- Mixed-concern files are split by responsibility
- Scattered utilities are consolidated
- All tests pass (unit + integration)
- Code passes linting
- Build succeeds
- Public API remains unchanged (except intentional renames)
- Test coverage maintained or improved
References
- Analysis Run: §21483321988
- Codebase Guidelines: AGENTS.md, CONTRIBUTING.md
- Related Issues: [file-diet] Refactor add_interactive.go into focused modules (1025 lines) #12535 (file-diet analysis)
AI generated by Semantic Function Refactoring