Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization and Refactoring Opportunities #11675

@github-actions

Description

@github-actions

A comprehensive semantic analysis of 462 non-test Go files across the pkg/ directory identified significant opportunities for improving code organization through function clustering, consolidation, and refactoring.

Executive Summary

Analysis Scope:

  • 462 Go files analyzed (454 initially discovered + 8 additional)
  • ~3,500+ total functions cataloged (2000+ exported, 1500+ unexported)
  • Primary packages:
    • pkg/workflow/: 241 files (52% of codebase)
    • pkg/cli/: 151 files (33%)
    • pkg/parser/: 29 files (6%)
    • pkg/campaign/: 11 files (2%)
    • Utility packages: 30 files (7%)

Key Findings:

  • Excellent patterns: Parser, console, and campaign packages show strong organization
  • ⚠️ Fragmentation issues: Workflow and CLI packages have 100+ small files that could consolidate
  • 🔄 Duplicate code: Multiple implementations of parsing and validation logic detected
  • 📁 Helper scatter: Format/parse/validate helpers spread across 40+ files

Critical Issues Identified

Issue #1: Compiler Safe Outputs Fragmentation (14 files → 4 files)

Current structure: 14 separate files for safe outputs compilation

  • compiler_safe_outputs.go, compiler_safe_outputs_core.go
  • compiler_safe_outputs_config.go, compiler_safe_outputs_env.go
  • compiler_safe_outputs_job.go, compiler_safe_outputs_jobs.go, compiler_safe_outputs_steps.go
  • compiler_safe_outputs_shared.go, compiler_safe_outputs_specialized.go
  • compiler_safe_outputs_discussions.go
  • Plus 4 additional config files

Issue: Related functions split across too many files (6-12 functions per file), making code navigation difficult.

Recommendation:

Consolidate to 4 focused files:
1. compiler_safe_outputs.go - Main orchestration
2. compiler_safe_outputs_config.go - Config generation (merge 5 config files)
3. compiler_safe_outputs_env.go - Environment setup
4. compiler_safe_outputs_jobs.go - Job/step generation (merge 6 job-related files)
``````

**Estimated Impact:** Reduced file count, improved discoverability, easier maintenance

---

#### Issue #2: Safe Outputs Config Generation Split (8 files → 2 files)

**Current structure:** Configuration generation fragmented across:
- `safe_outputs_config.go` (6 functions)
- `safe_outputs_config_generation.go` (8 functions)
- `safe_outputs_config_generation_helpers.go` (10 functions)
- `safe_outputs_config_helpers.go` (9 functions)
- `safe_outputs_config_helpers_reflection.go` (6 functions)
- `safe_outputs_config_messages.go` (4 functions)

**Issue:** Semantically related config generation logic scattered across 6 files.

**Recommendation:**
``````
Consolidate to 2 files:
1. safe_outputs_config.go - Main types and core functions
2. safe_outputs_config_generation.go - All generation logic + helpers
``````

**Files to merge:** Generation, generation_helpers, config_helpers, helpers_reflection, messages

---

#### Issue #3: Permissions.go Overly Large (36 functions → 2-3 files)

**File:** `pkg/workflow/permissions.go` (36 functions - 800+ lines)

**Current mix:**
- **Parsing:** `NewPermissionsParser()`, `parse()`, `IsShorthand()`, `GetPermissions()`
- **Building:** `NewPermissions()`, `NewPermissionsReadAll()`, 10+ builder constructors
- **Utilities:** `ContainsCheckout()`, `GetAllPermissionScopes()`, conversion functions

**Issue:** Single file handles parsing, building, and utility functions - violates Single Responsibility Principle.

**Recommendation:**
``````
Split into focused files:
1. permissions_parser.go - Parsing logic (NewPermissionsParser, parse, getters)
2. permissions_builder.go - Builder/constructor functions (New* methods)
3. permissions_utilities.go - Utility functions (scope getters, converters)

Keep: permissions_validator.go (already exists)
``````

**Location:** pkg/workflow/permissions.go:1

---

#### Issue #4: Duplicate GitHub URL/Repo Parsing Functions

**Duplicate implementations detected:**

<details>
<summary><b>Locations with similar parsing logic</b></summary>

1. **extractBaseRepo()** - appears in multiple locations:
   - pkg/workflow/action_pins.go (~line 150)
   - pkg/workflow/action_resolver.go (~line 80)
   - Both parse GitHub action repo strings identically

2. **GitHub URL parsing** scattered across:
   - pkg/parser/github.go - `ParseGitHubURL()`
   - pkg/parser/github_urls.go - `ParseGitHubURL()` variants, `ParsePRURL()`, `ParseRunURL()`
   - pkg/workflow/github_tool_to_toolset.go - Custom GitHub parsing
   - pkg/repoutil/repoutil.go - Repository slug parsing

3. **Target repo parsing** duplicated:
   - `parseTargetRepo()` functions in 3+ different files
   - Similar logic, different contexts

</details>

**Recommendation:**
``````
1. Consolidate extractBaseRepo() into single location (action_resolver.go)
2. Create unified GitHub URL parsing in pkg/parser/github_urls.go
3. Eliminate duplicate parseTargetRepo() implementations
``````

**Estimated Impact:** Reduced duplication, single source of truth for parsing logic

---

#### Issue #5: Validation Functions Scattered Across 40+ Files

**Pattern:** Excellent validation architecture documented in `pkg/workflow/validation.go`, but implementation incomplete.

**Properly consolidated (✅ good examples):**
- `runtime_validation.go` (6 functions)
- `repository_features_validation.go` (5 functions)
- `agent_validation.go` (5 functions)
- `expression_validation.go` (13 functions)
- `strict_mode_validation.go` (7 functions)

**Improperly scattered (⚠️ needs consolidation):**
- Validation in entity files: `create_issue.go`, `create_discussion.go`, `add_comment.go` (1-2 validate functions each)
- 3 separate bundler validation files (safety, script, runtime)
- Small validation files with 2-3 functions: `firewall_validation.go`, `sandbox_validation.go`

**Recommendation:**
``````
1. Create entity_validation.go for entity-specific validations currently in create*/update* files
2. Merge bundler validations: bundler_safety_validation.go + bundler_script_validation.go + bundler_runtime_validation.go → bundler_validation.go
3. Merge small files:
   - firewall_validation.go (2-3 functions) → engine_validation.go
   - sandbox_validation.go (2-3 functions) → strict_mode_validation.go
   - dangerous_permissions_validation.go → permissions_validator.go
``````

**Impact:** Reduce 40+ validation files to ~30, improve consistency with documented architecture

---

#### Issue #6: Helper Functions Fragmentation

**Problem:** Format, parse, and utility helpers scattered across 40+ files.

<details>
<summary><b>Examples of scattered helpers</b></summary>

**String/Format helpers:**
- `formatYAMLValue()` in one location
- `formatValidationOutput()` in validation_helpers.go
- `formatTemplateInjectionError()` in template_injection_validation.go
- `formatDangerousPermissionsError()` in dangerous_permissions_validation.go
- `formatBlockedDomains()` in safe_outputs_domains_validation.go
- `formatNetworkAccess()` in scattered location
- `formatMissingPermissionsMessage()` in scattered location

**Parse/Extract helpers:**
- `extractBaseRepo()` in action_resolver.go and action_pins.go
- `parseTargetRepo()` in 3 different files
- `parseTimeoutTool()` in isolated location
- `parseTimeDelta()` in multiple locations

**Validation helpers:**
- `validateDomainPattern()` in multiple validation files
- `validateTargetRepoSlug()` in multiple places
- `validateNoTemplateInjection()` pattern repeated
- `validateNoExecSync()`, `validateNoModuleReferences()` - similar patterns

</details>

**Recommendation:**
``````
Create unified helper modules:
1. pkg/workflow/helpers_format.go - All format* functions (~20 functions)
2. pkg/workflow/helpers_parse.go - All parse/extract helpers (~15 functions)
3. pkg/workflow/helpers_build.go - All build* generators (~10 functions)
4. pkg/workflow/helpers_validate.go - Standalone validators (~10 functions)
``````

**Benefit:** Centralized, discoverable utilities; eliminate duplication

---

#### Issue #7: CLI Compile Subsystem Fragmentation (14 files → 7 files)

**Current structure:** 14 compile_* files in pkg/cli/
- `compile_command.go`, `compile_config.go`, `compile_compiler_setup.go`
- `compile_batch_operations.go`, `compile_campaign.go`
- `compile_helpers.go`, `compile_orchestration.go`, `compile_orchestrator.go`
- `compile_output_formatter.go`, `compile_post_processing.go`, `compile_stats.go`
- `compile_validation.go`, `compile_watch.go`, `compile_workflow_processor.go`

**Issue:** Overlapping concerns - multiple files handle orchestration, multiple handle output.

**Recommendation:**
``````
Consolidate to 7 files:
1. compile_command.go - Main entry point (keep as-is)
2. compile_config.go - Configuration (merge compile_compiler_setup.go)
3. compile_processor.go - Processing (merge orchestration + orchestrator + workflow_processor)
4. compile_campaign.go - Campaigns (keep as-is)
5. compile_validation.go - Validation (merge validation + relevant helpers)
6. compile_output.go - Output (merge output_formatter + stats + post_processing)
7. compile_watch.go - Watch mode (keep as-is)
``````

**Impact:** 14 files → 7 files, clearer concern separation

---

#### Issue #8: CLI Codemod Subsystem Over-Fragmentation (15 files → 6 files)

**Current structure:** 15 separate codemod files, one per operation
- `codemod_agent_session.go`, `codemod_discussion_flag.go`, `codemod_grep_tool.go`
- `codemod_mcp_network.go`, `codemod_network_firewall.go`, `codemod_permissions.go`
- `codemod_safe_inputs.go`, `codemod_sandbox_agent.go`, `codemod_schedule.go`
- `codemod_schema_file.go`, `codemod_slash_command.go`, `codemod_timeout_minutes.go`
- `codemod_upload_assets.go`, `codemod_yaml_utils.go`

**Issue:** Each codemod in separate file, even when semantically related.

**Recommendation:**
``````
Group by functional area (15 → 6 files):
1. codemod_core.go - Core modifications (permissions, schedule, timeout)
2. codemod_infrastructure.go - Infrastructure (firewall, network, sandbox)
3. codemod_features.go - Features (safe_inputs, upload_assets)
4. codemod_tools.go - Tooling (grep_tool, yaml_utils)
5. codemod_integration.go - Integration (agent_session, discussion_flag)
6. codemod_schema.go - Schema operations (keep focused)

Impact: Better semantic grouping, reduced file proliferation


Detailed Recommendations by Priority

High Priority (Immediate Impact)

1. Extract Duplicate Functions

Action: Eliminate duplicate parseGitHub/parseRepo functions

  • Create pkg/parser/github_url_utils.go or consolidate in existing parser files
  • Remove duplicates in workflow and repoutil packages
  • Files affected: action_pins.go, action_resolver.go, github_tool_to_toolset.go, repoutil.go

Benefit: Single source of truth for URL parsing, reduced maintenance burden

2. Consolidate Bundler Validation (3 → 1 file)

Action: Merge bundler validation files

  • Combine: bundler_safety_validation.go, bundler_script_validation.go, bundler_runtime_validation.go
  • Create: pkg/workflow/bundler_validation.go with organized sections

Files to consolidate:

  • pkg/workflow/bundler_safety_validation.go
  • pkg/workflow/bundler_script_validation.go
  • pkg/workflow/bundler_runtime_validation.go
3. Split Permissions.go (1 → 3 files)

Action: Refactor oversized permissions.go (36 functions)

  • Extract: permissions_parser.go (parsing logic)
  • Extract: permissions_builder.go (constructor functions)
  • Keep utilities in focused file

Location: pkg/workflow/permissions.go:1

4. Create Unified Helpers Module

Action: Extract scattered helper functions to dedicated files

  • Create: pkg/workflow/helpers_format.go (~20 format functions)
  • Create: pkg/workflow/helpers_parse.go (~15 parse functions)
  • Create: pkg/workflow/helpers_validate.go (~10 validation utilities)

Benefit: Centralized, discoverable utilities


Medium Priority (Next Phase)

5. Consolidate Safe Outputs Config (8 → 2 files)

Action: Merge safe_outputs_config_* files

  • Keep: safe_outputs_config.go (types + core)
  • Merge into safe_outputs_config_generation.go: generation, helpers, reflection, messages files

Files affected: 8 files in pkg/workflow/safe_outputs_config*.go

6. Reorganize CLI Codemod (15 → 6 files)

Action: Group codemods by functional area

Impact: Better semantic organization, easier to find related operations

7. Reduce Compiler Safe Outputs Files (14 → 4)

Action: Consolidate compiler_safe_outputs_* files

Impact: Clearer structure, easier navigation

8. Move Entity Helpers to Proper Locations

Action: Consolidate or distribute helper files

  • Option A: Merge close_entity_helpers.go and update_entity_helpers.go into entity_operations_helpers.go
  • Option B: Distribute functions into respective entity files as unexported helpers

Files affected:

  • pkg/workflow/close_entity_helpers.go
  • pkg/workflow/update_entity_helpers.go

Low Priority (Future Improvements)

9. Split Agentic Engine (1 → 2-3 files)

File: pkg/workflow/agentic_engine.go (29 functions)
Recommendation: Separate core logic from tools/execution

  • agentic_engine_core.go - Main engine logic
  • agentic_engine_tools.go - Tool handling
  • agentic_engine_execution.go - Execution logic (if needed)
10. Consolidate Prompt Files (3 → 2)

Files:

  • pkg/workflow/prompt_step.go
  • pkg/workflow/prompt_step_helper.go
  • pkg/workflow/unified_prompt_step.go

Recommendation: Merge helpers and unified into main prompt_step files

11. Reduce Frontmatter Fragmentation (10 → 6 files)

Action: Consolidate frontmatter_extraction_* files

  • Merge metadata, security, YAML extraction into 2 focused files

Summary Statistics

File Metrics:

  • Total analyzed: 462 non-test Go files
  • Exported functions: ~2,000+
  • Unexported functions: ~1,500+
  • Files with 20+ functions: 15 (potential split candidates)
  • Files with 1-5 functions: 200+ (consolidation candidates)

Largest files by function count:

  1. js.go - 46 functions
  2. permissions.go - 36 functions
  3. agentic_engine.go - 29 functions
  4. compiler_types.go - 28 functions
  5. expression_builder.go - 27 functions

Most fragmented areas:

  1. Safe outputs subsystem (37 files)
  2. Compiler subsystem (60 files)
  3. CLI utilities (60+ scattered files)
  4. Validation functions (40+ files)
  5. Create/Update entity operations (41 files)

Best organized areas (✅ good patterns to follow):

  1. Parser package (29 files) - Clear semantic groups (frontmatter, imports, schedule, schema)
  2. Console package (10 files) - Focused concerns (format, render, layout, progress, spinner)
  3. Campaign package (11 files) - Clear module separation
  4. Engine subsystems - Consistent naming patterns (engine.go, _logs.go, _mcp.go, _tools.go)

Implementation Checklist

  • Phase 1: High-Priority Refactoring

    • Extract duplicate GitHub URL parsing functions
    • Consolidate bundler validation files (3 → 1)
    • Split permissions.go (1 → 3)
    • Create unified helpers module (4 new files)
  • Phase 2: Medium-Priority Consolidation

    • Consolidate safe_outputs_config files (8 → 2)
    • Reorganize CLI codemod subsystem (15 → 6)
    • Reduce compiler_safe_outputs files (14 → 4)
    • Move entity helpers to proper locations
  • Phase 3: Long-term Improvements

    • Split oversized files (agentic_engine.go)
    • Consolidate prompt files (3 → 2)
    • Reduce frontmatter fragmentation (10 → 6)
    • Systematic CLI package cleanup (150 → 100 files)
  • Phase 4: Testing & Validation

    • Verify no functionality broken after refactoring
    • Update tests to reflect new file structure
    • Update documentation with new organization
    • Validate all imports and references updated

Analysis Metadata

  • Analysis Date: 2026-01-24
  • Repository: githubnext/gh-aw
  • Total Files Analyzed: 462 Go files (excluding tests)
  • Detection Method: Semantic code analysis using naming patterns, function clustering, and code organization assessment
  • Primary Focus: pkg/workflow (52% of files), pkg/cli (33% of files)
  • Workflow Run: §21316929471

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions