Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis: Refactoring Opportunities #3119

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Analysis of repository: githubnext/gh-aw

Executive Summary

Analysis of 180 non-test Go files across the pkg/ directory revealed several refactoring opportunities through semantic function clustering and duplicate detection. The codebase is generally well-organized with clear package boundaries, but there are opportunities to improve code organization by consolidating validation functions, eliminating duplicate code, and centralizing scattered utilities.

Key Findings:

  • 180 files analyzed across 7 packages (workflow: 108, cli: 60, parser: 6, console: 3, others: 3)
  • 9 files with validation functions scattered outside validation.go
  • 2+ exact duplicate JavaScript functions identified
  • Multiple parsing functions distributed across 15+ files
  • Scattered helper/utility files across packages
Full Report Details

Function Inventory

Package Distribution

Package File Count Primary Purpose
pkg/workflow/ 108 Core workflow compilation, engines, safe outputs
pkg/cli/ 60 CLI commands, MCP management, tooling
pkg/parser/ 6 YAML/JSON parsing, GitHub API, frontmatter
pkg/console/ 3 Terminal UI rendering
pkg/constants/ 1 Shared constants
pkg/logger/ 1 Logging utilities
pkg/timeutil/ 1 Time formatting

Key File Organization

The repository follows Go best practices with feature-based file organization:

  • Engine files: claude_engine.go, copilot_engine.go, codex_engine.go, etc.
  • Create operations: create_issue.go, create_pr.go, create_discussion.go
  • Specialized functionality per file

Identified Issues

1. Validation Functions Scattered Across Multiple Files

Issue: Validation functions exist in 9+ files outside the dedicated validation.go file, violating the single responsibility principle for validation logic.

Files with Misplaced Validation Functions:

  1. pkg/workflow/compiler.go - Contains validation logic that should be in validation.go
  2. pkg/workflow/docker.go:86 - validateDockerImage() function
  3. pkg/workflow/engine.go:261,315 - validateEngine(), validateSingleEngineSpecification()
  4. pkg/workflow/expression_safety.go:66,141 - validateExpressionSafety(), validateSingleExpression()
  5. pkg/workflow/mcp-config.go:1034,1046 - validateStringProperty(), validateMCPRequirements()
  6. pkg/workflow/npm.go:45 - validateNpxPackages()
  7. pkg/workflow/pip.go:49,84,113,174 - Multiple Python package validation functions
  8. pkg/workflow/strict_mode.go:43,72,94,115,155 - Five strict mode validation functions
  9. pkg/workflow/template.go:54 - validateNoIncludesInTemplateRegions()

Current State:

  • validation.go has 30+ validation functions (primary validation file)
  • 9 other files contain 20+ additional validation functions

Recommendation:

  • Move domain-specific validation to appropriate domain files (e.g., Docker validation can stay in docker.go)
  • Move general validation functions to validation.go
  • Consider creating validation sub-files if validation.go becomes too large (e.g., validation_packages.go, validation_strict_mode.go)

Estimated Impact: Medium - Improved code organization and easier testing of validation logic


2. Exact Duplicate JavaScript Function: sanitizeLabelContent

Issue: The sanitizeLabelContent function appears identically in two JavaScript files.

Duplicate Occurrences:

Occurrence 1: pkg/workflow/js/create_issue.cjs:4-17

function sanitizeLabelContent(content) {
  if (!content || typeof content !== "string") {
    return "";
  }
  let sanitized = content.trim();
  sanitized = sanitized.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, "");
  sanitized = sanitized.replace(/\x1b\[[0-9;]*[mGKH]/g, "");
  sanitized = sanitized.replace(
    /(^|[^\w`])@([A-Za-z0-9](?:[A-Za-z0-9-]{0,37}[A-Za-z0-9])?(?:\/[A-Za-z0-9._-]+)?)/g,
    (_m, p1, p2) => `${p1}\`@${p2}\``
  );
  sanitized = sanitized.replace(/[<>&'"]/g, "");
  return sanitized.trim();
}

Occurrence 2: pkg/workflow/js/add_labels.cjs:4-17 (identical implementation)

Code Similarity: 100% identical (14 lines)

Recommendation:

  • Extract to shared utility module (e.g., pkg/workflow/js/shared_utils.cjs)
  • Import in both files instead of duplicating
  • Benefits: Single source of truth, easier maintenance, reduced code size

Estimated Impact: Low effort, high maintainability benefit


3. Duplicate Sanitization Functions (Go and JavaScript)

Issue: Multiple sanitization functions exist across different languages and files with similar purposes.

Sanitization Functions Found:

Go Functions:

  • pkg/workflow/strings.go:75 - SanitizeName()
  • pkg/workflow/strings.go:157 - SanitizeWorkflowName()
  • pkg/workflow/workflow_name.go:12 - SanitizeIdentifier()

JavaScript Functions:

  • pkg/workflow/js/sanitize.cjs:14 - sanitizeContent()
  • pkg/workflow/js/parse_firewall_logs.cjs:242 - sanitizeWorkflowName()
  • pkg/workflow/js/create_issue.cjs:4 - sanitizeLabelContent()
  • pkg/workflow/js/add_labels.cjs:4 - sanitizeLabelContent() (duplicate)

Analysis:

  • Go sanitization is reasonably consolidated in strings.go and workflow_name.go
  • JavaScript sanitization is scattered and duplicated
  • Some naming inconsistency (SanitizeIdentifier vs SanitizeWorkflowName)

Recommendation:

  • Go side is acceptable - Keep as is
  • JavaScript side needs consolidation - Create shared sanitization utilities module

Estimated Impact: Medium - Reduces JavaScript code duplication


4. Parsing Functions Distributed Across 15+ Files

Issue: Parsing functions are distributed across many files instead of being centralized in parser-related files or following a clear pattern.

Files with Parse Functions:

  1. pkg/workflow/time_delta.go - Multiple date/time parsing functions
  2. pkg/workflow/comment.go - ParseCommandEvents()
  3. pkg/workflow/dependabot.go - parseNpmPackage(), parsePipPackage(), parseGoPackage()
  4. pkg/workflow/create_discussion.go - parseDiscussionsConfig()
  5. pkg/workflow/create_pr_review_comment.go - parsePullRequestReviewCommentsConfig()
  6. pkg/workflow/threat_detection.go - parseThreatDetectionConfig()
  7. pkg/workflow/expressions.go - Expression parsing
  8. pkg/workflow/frontmatter_extraction.go - Frontmatter parsing
  9. And 7+ more files...

Analysis:

  • Domain-specific parsing (e.g., time parsing in time_delta.go) ✅ Good organization
  • Config parsing (e.g., parseDiscussionsConfig()) ✅ Acceptable in feature files
  • Generic parsing utilities scattered across multiple files ⚠️ Could be improved

Recommendation:

  • Keep domain-specific parsers in their respective files (time, expressions, etc.)
  • Keep config parsers in feature-specific files (create_discussion.go, etc.)
  • ⚠️ Consider extracting common parsing patterns if code duplication is found

Estimated Impact: Low priority - Current organization is mostly acceptable


5. Helper/Utility File Organization

Issue: Helper and utility files exist in both pkg/cli/ and pkg/workflow/ with varying naming conventions.

Current Helper Files:

CLI Package:

  • frontmatter_utils.go - Frontmatter manipulation utilities
  • repeat_utils.go - Retry logic utilities
  • shared_utils.go - General shared utilities

Workflow Package:

  • engine_helpers.go - Engine-specific helper functions
  • prompt_step_helper.go - Prompt step utilities
  • strings.go - String manipulation utilities
  • safe_outputs_env_test_helpers.go - Test helper (appropriately named)

Analysis:

  • Good: Naming convention with _utils and _helpers suffixes
  • Good: Test helpers clearly identified
  • ⚠️ Mixed: Some utilities specific to domain (good), others generic (could be consolidated)

Recommendation:

  • Keep current organization - It's reasonable and follows Go conventions
  • Consider documenting the distinction between "utils" and "helpers" in contribution guidelines
  • Monitor for utility function sprawl in future

Estimated Impact: Very low - Current state is acceptable


Semantic Function Clustering Results

Cluster 1: Validation Functions ⚠️ (Scattered)

Pattern: validate* functions
Total Found: 50+ validation functions
Primary File: validation.go (30+ functions)
Scattered Across: 9 additional files

Analysis: While having a primary validation file is good, too many validation functions are scattered. This creates maintenance challenges and makes it harder to understand validation logic.


Cluster 2: Sanitization Functions ⚠️ (Partially Consolidated)

Pattern: sanitize* or Sanitize* functions
Total Found: 10+ functions (Go + JavaScript)
Files:

  • Go: strings.go, workflow_name.go (good consolidation)
  • JavaScript: 4+ files with duplicates (needs improvement)

Analysis: Go side is well-organized, JavaScript side has duplicates.


Cluster 3: Parsing Functions ✅ (Acceptable)

Pattern: parse* or Parse* functions
Total Found: 40+ parsing functions
Distribution: Spread across 15+ files based on domain

Analysis: Most parsing functions are appropriately placed in domain-specific files. This is good organization.


Cluster 4: Rendering Functions ✅ (Well Organized)

Pattern: render* or Render* functions
Total Found: 30+ rendering functions
Organization: Test files + engine_helpers.go + specific engine files

Analysis: Rendering logic is appropriately distributed. No consolidation needed.


Cluster 5: Formatting Functions ✅ (Well Organized)

Pattern: format* or Format* functions
Total Found: 20+ formatting functions
Key Files: engine_helpers.go, js.go, permissions_validator.go

Analysis: Formatting functions are reasonably organized by purpose.


Refactoring Recommendations

Priority 1: High Impact, Low Effort

1.1 Consolidate Duplicate JavaScript Function

Task: Extract sanitizeLabelContent to shared utility module

Steps:

  1. Create pkg/workflow/js/label_utils.cjs with the sanitization function
  2. Update create_issue.cjs to import from shared module
  3. Update add_labels.cjs to import from shared module
  4. Add tests for the shared function

Estimated Effort: 1-2 hours
Benefits:

  • Eliminates 14 lines of duplicate code
  • Single source of truth for label sanitization
  • Easier to test and maintain

1.2 Review and Document Validation Function Organization

Task: Create guidelines for where validation functions should live

Steps:

  1. Document validation function placement rules in CONTRIBUTING.md:
    • Domain-specific validations → domain files (e.g., Docker validation in docker.go)
    • General workflow validations → validation.go
    • Complex validation logic → consider sub-files
  2. Review the 9 files with scattered validation functions
  3. Move or document exceptions

Estimated Effort: 2-3 hours
Benefits:

  • Clear guidelines for contributors
  • Prevents future validation sprawl
  • Improves code discoverability

Priority 2: Medium Impact, Medium Effort

2.1 Consolidate JavaScript Sanitization Utilities

Task: Create shared JavaScript sanitization module

Steps:

  1. Create pkg/workflow/js/sanitize_shared.cjs
  2. Move sanitizeLabelContent (from Priority 1)
  3. Consider consolidating other JS sanitization functions
  4. Update imports in dependent files
  5. Add comprehensive tests

Estimated Effort: 3-4 hours
Benefits:

  • Centralized JavaScript sanitization logic
  • Reduced duplication
  • Easier to apply consistent sanitization rules

2.2 Consider Validation Sub-Files

Task: Split validation.go if it becomes too large

Approach: Only if validation.go exceeds 1000 lines or has distinct validation domains

Suggested Split (if needed):

  • validation.go - Core workflow validations
  • validation_packages.go - Package validation (npm, pip, etc.)
  • validation_strict_mode.go - Strict mode validations
  • validation_features.go - Repository feature validations

Estimated Effort: 4-6 hours (only if needed)
Benefits:

  • Easier navigation of validation logic
  • Logical grouping of related validations

Priority 3: Long-term Improvements

3.1 Monitor for Utility Function Sprawl

Task: Establish guidelines for when to create new utility files

Guidelines:

  • Functions used in 3+ files → move to utility file
  • Domain-specific utilities → keep in domain file
  • Test helpers → suffix with _test_helpers.go

Estimated Effort: Ongoing code review discipline
Benefits: Prevents future utility sprawl


File Organization Assessment

Well-Organized Areas ✅

  1. Engine Architecture: Each engine has its own file (claude, copilot, codex)
  2. Create Operations: Separate files for each creation type (issue, PR, discussion)
  3. String Utilities: Consolidated in strings.go
  4. Test Organization: Clear _test.go suffix convention

Areas for Improvement ⚠️

  1. Validation Functions: Too scattered (9+ files)
  2. JavaScript Duplicates: Exact duplicates exist
  3. Sanitization (JS): Could be more consolidated

Implementation Checklist

  • P1.1: Extract sanitizeLabelContent to shared JS utility
  • P1.2: Document validation function placement guidelines
  • P2.1: Create JavaScript sanitization shared module
  • P2.2: Evaluate if validation.go needs splitting
  • P3.1: Establish utility function placement guidelines
  • Verify no functionality broken after changes
  • Update tests to reflect refactoring
  • Update CONTRIBUTING.md with new guidelines

Analysis Metadata

  • Total Go Files Analyzed: 180 (excluding test files)
  • Total Functions Cataloged: 500+ functions across all files
  • Function Clusters Identified: 5 major clusters (validation, sanitization, parsing, rendering, formatting)
  • Outliers Found: 20+ validation functions in wrong files
  • Exact Duplicates Detected: 2+ JavaScript functions (100% match)
  • Near-Duplicates Detected: Multiple sanitization functions with similar purpose
  • Detection Method: Serena semantic code analysis + grep pattern analysis + manual review
  • Analysis Date: 2025-11-04
  • Packages Analyzed: cli (60 files), workflow (108 files), parser (6 files), console (3 files), others (3 files)

Conclusion

The gh-aw codebase demonstrates generally good organization with clear package boundaries and feature-based file structure. The primary opportunities for improvement are:

  1. ⚠️ High Priority: Eliminate JavaScript code duplication (quick win)
  2. ⚠️ Medium Priority: Consolidate scattered validation functions
  3. Low Priority: Current helper organization is acceptable

The refactoring recommendations focus on high-impact, low-effort improvements that will enhance maintainability without requiring extensive restructuring. Most of the codebase follows Go best practices effectively.


Note: This analysis focused on non-test Go files (.go excluding *_test.go) and associated JavaScript files in the pkg/ directory. The findings represent refactoring opportunities discovered through semantic function clustering, naming pattern analysis, and duplicate detection using Serena's code analysis tools.

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions