-
Notifications
You must be signed in to change notification settings - Fork 36
Description
🔧 Semantic Function Clustering Analysis
Analysis of repository: githubnext/gh-aw
Executive Summary
Analysis of 180 non-test Go files across the pkg/ directory revealed several refactoring opportunities through semantic function clustering and duplicate detection. The codebase is generally well-organized with clear package boundaries, but there are opportunities to improve code organization by consolidating validation functions, eliminating duplicate code, and centralizing scattered utilities.
Key Findings:
- 180 files analyzed across 7 packages (workflow: 108, cli: 60, parser: 6, console: 3, others: 3)
- 9 files with validation functions scattered outside
validation.go - 2+ exact duplicate JavaScript functions identified
- Multiple parsing functions distributed across 15+ files
- Scattered helper/utility files across packages
Full Report Details
Function Inventory
Package Distribution
| Package | File Count | Primary Purpose |
|---|---|---|
pkg/workflow/ |
108 | Core workflow compilation, engines, safe outputs |
pkg/cli/ |
60 | CLI commands, MCP management, tooling |
pkg/parser/ |
6 | YAML/JSON parsing, GitHub API, frontmatter |
pkg/console/ |
3 | Terminal UI rendering |
pkg/constants/ |
1 | Shared constants |
pkg/logger/ |
1 | Logging utilities |
pkg/timeutil/ |
1 | Time formatting |
Key File Organization
The repository follows Go best practices with feature-based file organization:
- Engine files:
claude_engine.go,copilot_engine.go,codex_engine.go, etc. - Create operations:
create_issue.go,create_pr.go,create_discussion.go - Specialized functionality per file
Identified Issues
1. Validation Functions Scattered Across Multiple Files
Issue: Validation functions exist in 9+ files outside the dedicated validation.go file, violating the single responsibility principle for validation logic.
Files with Misplaced Validation Functions:
pkg/workflow/compiler.go- Contains validation logic that should be invalidation.gopkg/workflow/docker.go:86-validateDockerImage()functionpkg/workflow/engine.go:261,315-validateEngine(),validateSingleEngineSpecification()pkg/workflow/expression_safety.go:66,141-validateExpressionSafety(),validateSingleExpression()pkg/workflow/mcp-config.go:1034,1046-validateStringProperty(),validateMCPRequirements()pkg/workflow/npm.go:45-validateNpxPackages()pkg/workflow/pip.go:49,84,113,174- Multiple Python package validation functionspkg/workflow/strict_mode.go:43,72,94,115,155- Five strict mode validation functionspkg/workflow/template.go:54-validateNoIncludesInTemplateRegions()
Current State:
validation.gohas 30+ validation functions (primary validation file)- 9 other files contain 20+ additional validation functions
Recommendation:
- Move domain-specific validation to appropriate domain files (e.g., Docker validation can stay in
docker.go) - Move general validation functions to
validation.go - Consider creating validation sub-files if
validation.gobecomes too large (e.g.,validation_packages.go,validation_strict_mode.go)
Estimated Impact: Medium - Improved code organization and easier testing of validation logic
2. Exact Duplicate JavaScript Function: sanitizeLabelContent
Issue: The sanitizeLabelContent function appears identically in two JavaScript files.
Duplicate Occurrences:
Occurrence 1: pkg/workflow/js/create_issue.cjs:4-17
function sanitizeLabelContent(content) {
if (!content || typeof content !== "string") {
return "";
}
let sanitized = content.trim();
sanitized = sanitized.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, "");
sanitized = sanitized.replace(/\x1b\[[0-9;]*[mGKH]/g, "");
sanitized = sanitized.replace(
/(^|[^\w`])@([A-Za-z0-9](?:[A-Za-z0-9-]{0,37}[A-Za-z0-9])?(?:\/[A-Za-z0-9._-]+)?)/g,
(_m, p1, p2) => `${p1}\`@${p2}\``
);
sanitized = sanitized.replace(/[<>&'"]/g, "");
return sanitized.trim();
}Occurrence 2: pkg/workflow/js/add_labels.cjs:4-17 (identical implementation)
Code Similarity: 100% identical (14 lines)
Recommendation:
- Extract to shared utility module (e.g.,
pkg/workflow/js/shared_utils.cjs) - Import in both files instead of duplicating
- Benefits: Single source of truth, easier maintenance, reduced code size
Estimated Impact: Low effort, high maintainability benefit
3. Duplicate Sanitization Functions (Go and JavaScript)
Issue: Multiple sanitization functions exist across different languages and files with similar purposes.
Sanitization Functions Found:
Go Functions:
pkg/workflow/strings.go:75-SanitizeName()pkg/workflow/strings.go:157-SanitizeWorkflowName()pkg/workflow/workflow_name.go:12-SanitizeIdentifier()
JavaScript Functions:
pkg/workflow/js/sanitize.cjs:14-sanitizeContent()pkg/workflow/js/parse_firewall_logs.cjs:242-sanitizeWorkflowName()pkg/workflow/js/create_issue.cjs:4-sanitizeLabelContent()pkg/workflow/js/add_labels.cjs:4-sanitizeLabelContent()(duplicate)
Analysis:
- Go sanitization is reasonably consolidated in
strings.goandworkflow_name.go - JavaScript sanitization is scattered and duplicated
- Some naming inconsistency (
SanitizeIdentifiervsSanitizeWorkflowName)
Recommendation:
- ✅ Go side is acceptable - Keep as is
- ❌ JavaScript side needs consolidation - Create shared sanitization utilities module
Estimated Impact: Medium - Reduces JavaScript code duplication
4. Parsing Functions Distributed Across 15+ Files
Issue: Parsing functions are distributed across many files instead of being centralized in parser-related files or following a clear pattern.
Files with Parse Functions:
pkg/workflow/time_delta.go- Multiple date/time parsing functionspkg/workflow/comment.go-ParseCommandEvents()pkg/workflow/dependabot.go-parseNpmPackage(),parsePipPackage(),parseGoPackage()pkg/workflow/create_discussion.go-parseDiscussionsConfig()pkg/workflow/create_pr_review_comment.go-parsePullRequestReviewCommentsConfig()pkg/workflow/threat_detection.go-parseThreatDetectionConfig()pkg/workflow/expressions.go- Expression parsingpkg/workflow/frontmatter_extraction.go- Frontmatter parsing- And 7+ more files...
Analysis:
- Domain-specific parsing (e.g., time parsing in
time_delta.go) ✅ Good organization - Config parsing (e.g.,
parseDiscussionsConfig()) ✅ Acceptable in feature files - Generic parsing utilities scattered across multiple files
⚠️ Could be improved
Recommendation:
- ✅ Keep domain-specific parsers in their respective files (time, expressions, etc.)
- ✅ Keep config parsers in feature-specific files (create_discussion.go, etc.)
⚠️ Consider extracting common parsing patterns if code duplication is found
Estimated Impact: Low priority - Current organization is mostly acceptable
5. Helper/Utility File Organization
Issue: Helper and utility files exist in both pkg/cli/ and pkg/workflow/ with varying naming conventions.
Current Helper Files:
CLI Package:
frontmatter_utils.go- Frontmatter manipulation utilitiesrepeat_utils.go- Retry logic utilitiesshared_utils.go- General shared utilities
Workflow Package:
engine_helpers.go- Engine-specific helper functionsprompt_step_helper.go- Prompt step utilitiesstrings.go- String manipulation utilitiessafe_outputs_env_test_helpers.go- Test helper (appropriately named)
Analysis:
- ✅ Good: Naming convention with
_utilsand_helperssuffixes - ✅ Good: Test helpers clearly identified
⚠️ Mixed: Some utilities specific to domain (good), others generic (could be consolidated)
Recommendation:
- ✅ Keep current organization - It's reasonable and follows Go conventions
- Consider documenting the distinction between "utils" and "helpers" in contribution guidelines
- Monitor for utility function sprawl in future
Estimated Impact: Very low - Current state is acceptable
Semantic Function Clustering Results
Cluster 1: Validation Functions ⚠️ (Scattered)
Pattern: validate* functions
Total Found: 50+ validation functions
Primary File: validation.go (30+ functions)
Scattered Across: 9 additional files
Analysis: While having a primary validation file is good, too many validation functions are scattered. This creates maintenance challenges and makes it harder to understand validation logic.
Cluster 2: Sanitization Functions ⚠️ (Partially Consolidated)
Pattern: sanitize* or Sanitize* functions
Total Found: 10+ functions (Go + JavaScript)
Files:
- Go:
strings.go,workflow_name.go(good consolidation) - JavaScript: 4+ files with duplicates (needs improvement)
Analysis: Go side is well-organized, JavaScript side has duplicates.
Cluster 3: Parsing Functions ✅ (Acceptable)
Pattern: parse* or Parse* functions
Total Found: 40+ parsing functions
Distribution: Spread across 15+ files based on domain
Analysis: Most parsing functions are appropriately placed in domain-specific files. This is good organization.
Cluster 4: Rendering Functions ✅ (Well Organized)
Pattern: render* or Render* functions
Total Found: 30+ rendering functions
Organization: Test files + engine_helpers.go + specific engine files
Analysis: Rendering logic is appropriately distributed. No consolidation needed.
Cluster 5: Formatting Functions ✅ (Well Organized)
Pattern: format* or Format* functions
Total Found: 20+ formatting functions
Key Files: engine_helpers.go, js.go, permissions_validator.go
Analysis: Formatting functions are reasonably organized by purpose.
Refactoring Recommendations
Priority 1: High Impact, Low Effort
1.1 Consolidate Duplicate JavaScript Function
Task: Extract sanitizeLabelContent to shared utility module
Steps:
- Create
pkg/workflow/js/label_utils.cjswith the sanitization function - Update
create_issue.cjsto import from shared module - Update
add_labels.cjsto import from shared module - Add tests for the shared function
Estimated Effort: 1-2 hours
Benefits:
- Eliminates 14 lines of duplicate code
- Single source of truth for label sanitization
- Easier to test and maintain
1.2 Review and Document Validation Function Organization
Task: Create guidelines for where validation functions should live
Steps:
- Document validation function placement rules in CONTRIBUTING.md:
- Domain-specific validations → domain files (e.g., Docker validation in
docker.go) - General workflow validations →
validation.go - Complex validation logic → consider sub-files
- Domain-specific validations → domain files (e.g., Docker validation in
- Review the 9 files with scattered validation functions
- Move or document exceptions
Estimated Effort: 2-3 hours
Benefits:
- Clear guidelines for contributors
- Prevents future validation sprawl
- Improves code discoverability
Priority 2: Medium Impact, Medium Effort
2.1 Consolidate JavaScript Sanitization Utilities
Task: Create shared JavaScript sanitization module
Steps:
- Create
pkg/workflow/js/sanitize_shared.cjs - Move
sanitizeLabelContent(from Priority 1) - Consider consolidating other JS sanitization functions
- Update imports in dependent files
- Add comprehensive tests
Estimated Effort: 3-4 hours
Benefits:
- Centralized JavaScript sanitization logic
- Reduced duplication
- Easier to apply consistent sanitization rules
2.2 Consider Validation Sub-Files
Task: Split validation.go if it becomes too large
Approach: Only if validation.go exceeds 1000 lines or has distinct validation domains
Suggested Split (if needed):
validation.go- Core workflow validationsvalidation_packages.go- Package validation (npm, pip, etc.)validation_strict_mode.go- Strict mode validationsvalidation_features.go- Repository feature validations
Estimated Effort: 4-6 hours (only if needed)
Benefits:
- Easier navigation of validation logic
- Logical grouping of related validations
Priority 3: Long-term Improvements
3.1 Monitor for Utility Function Sprawl
Task: Establish guidelines for when to create new utility files
Guidelines:
- Functions used in 3+ files → move to utility file
- Domain-specific utilities → keep in domain file
- Test helpers → suffix with
_test_helpers.go
Estimated Effort: Ongoing code review discipline
Benefits: Prevents future utility sprawl
File Organization Assessment
Well-Organized Areas ✅
- Engine Architecture: Each engine has its own file (claude, copilot, codex)
- Create Operations: Separate files for each creation type (issue, PR, discussion)
- String Utilities: Consolidated in
strings.go - Test Organization: Clear
_test.gosuffix convention
Areas for Improvement ⚠️
- Validation Functions: Too scattered (9+ files)
- JavaScript Duplicates: Exact duplicates exist
- Sanitization (JS): Could be more consolidated
Implementation Checklist
- P1.1: Extract
sanitizeLabelContentto shared JS utility - P1.2: Document validation function placement guidelines
- P2.1: Create JavaScript sanitization shared module
- P2.2: Evaluate if
validation.goneeds splitting - P3.1: Establish utility function placement guidelines
- Verify no functionality broken after changes
- Update tests to reflect refactoring
- Update CONTRIBUTING.md with new guidelines
Analysis Metadata
- Total Go Files Analyzed: 180 (excluding test files)
- Total Functions Cataloged: 500+ functions across all files
- Function Clusters Identified: 5 major clusters (validation, sanitization, parsing, rendering, formatting)
- Outliers Found: 20+ validation functions in wrong files
- Exact Duplicates Detected: 2+ JavaScript functions (100% match)
- Near-Duplicates Detected: Multiple sanitization functions with similar purpose
- Detection Method: Serena semantic code analysis + grep pattern analysis + manual review
- Analysis Date: 2025-11-04
- Packages Analyzed: cli (60 files), workflow (108 files), parser (6 files), console (3 files), others (3 files)
Conclusion
The gh-aw codebase demonstrates generally good organization with clear package boundaries and feature-based file structure. The primary opportunities for improvement are:
⚠️ High Priority: Eliminate JavaScript code duplication (quick win)⚠️ Medium Priority: Consolidate scattered validation functions- ✅ Low Priority: Current helper organization is acceptable
The refactoring recommendations focus on high-impact, low-effort improvements that will enhance maintainability without requiring extensive restructuring. Most of the codebase follows Go best practices effectively.
Note: This analysis focused on non-test Go files (.go excluding *_test.go) and associated JavaScript files in the pkg/ directory. The findings represent refactoring opportunities discovered through semantic function clustering, naming pattern analysis, and duplicate detection using Serena's code analysis tools.
AI generated by Semantic Function Refactoring