Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Opportunities #13297

@github-actions

Description

@github-actions

Executive Summary

Comprehensive analysis of 462 non-test Go source files across the repository revealed a well-organized codebase with clear feature-based clustering. The analysis identified specific areas for improvement in naming consistency and potential modularization opportunities.

Key Findings:

  • ✅ Strong feature-based organization with consistent prefixes
  • ✅ Appropriate function distribution across responsibility domains
  • ⚠️ Minor naming inconsistencies in configuration files
  • ⚠️ Several large modules (60+ files) that could benefit from sub-packaging
  • ✅ No significant code duplication - sanitize/parse functions serve different purposes

Function Inventory by Package

Package Files Primary Purpose
workflow 248 Workflow compilation, execution, safe outputs, MCP integration
cli 173 CLI commands, compilation orchestration, interactive flows
parser 32 Frontmatter parsing, imports, schema validation, schedule parsing
console 11 Terminal UI (banners, progress, lists, formatting)
stringutil 5 String utilities (identifiers, paths, sanitization, URLs)
logger 4 Logging infrastructure and error formatting
types 2 Shared type definitions
utilities 8 Helper packages (tty, timeutil, testutil, etc.)

Total Functions Cataloged: ~3,500+ across all packages


Clustering Results

pkg/workflow (248 files)

Major Feature Clusters:

Compiler Family (66 files)
compiler_*.go files handle workflow compilation:
- compiler.go (main compiler logic)
- compiler_orchestrator_*.go (4 files) - Orchestration subsystem
- compiler_yaml_*.go (5 files) - YAML generation
- compiler_safe_outputs_*.go (11 files) - Safe outputs integration
- compiler_jobs.go, compiler_activation_jobs.go, compiler_filters_validation.go
- compiler_types.go, compiler_test_helpers.go

Organization: Excellent hierarchical structure with clear sub-concerns
``````
</details>

<details>
<summary><b>Safe Outputs Family (60 files)</b></summary>

``````
safe_outputs_*.go files handle safe outputs configuration:
- safe_outputs_config.go (runtime config - 388 lines)
- safe_output_config.go (base config - 26 lines) ⚠️ Naming inconsistency
- safe_outputs_jobs.go, safe_outputs_steps.go, safe_outputs_env.go
- safe_outputs_config_generation*.go (3 files)
- safe_outputs_config_helpers*.go (3 files)
- safe_outputs_domains_validation.go, safe_outputs_target_validation.go
- safe_outputs_app.go, safe_outputs.go (documentation marker)

Issues:
1. Inconsistent naming: safe_outputs_* (plural) vs safe_output_* (singular)
2. 60 files in single package - consider sub-packaging
3. safe_outputs.go is a 435-byte documentation file
``````
</details>

<details>
<summary><b>MCP Family (41 files)</b></summary>

``````
mcp_*.go files handle Model Context Protocol:
- mcp_config_*.go (15 files) - Config variants
- mcp_gateway_*.go (2 files) - Gateway configuration
- mcp_playwright_config.go, mcp_serena_config.go - Specific integrations
- mcp_setup_generator.go, mcp_renderer.go
- mcp_detection.go, mcp_environment.go

Organization: Well-structured with clear sub-concerns
``````
</details>

<details>
<summary><b>Create Entity Family (25 files)</b></summary>

``````
create_*.go files handle GitHub entity creation:
- create_issue.go, create_pull_request.go, create_discussion.go
- create_pr_review_comment.go, create_project.go
- create_project_status_update.go, create_code_scanning_alert.go
- create_agent_session.go

Organization: Excellent - one file per entity type ✓
``````
</details>

<details>
<summary><b>Engine Families</b></summary>

``````
Separate engine implementations:
- copilot_*.go (18 files) - GitHub Copilot engine
- claude_*.go (4 files) - Claude engine
- codex_*.go (3 files) - Codex engine
- custom_engine.go - Custom engine support
- engine_*.go (19 files) - Generic engine utilities

Well-separated concerns ✓
``````
</details>

**Common Function Patterns:**

| Pattern | Count | Distribution |
|---------|-------|--------------|
| `validate*` | 50+ | Spread across *_validation.go files ✓ |
| `parse*` | 177 | Concentrated in parser functions ✓ |
| `generate*` | 117 | YAML generation, prompts, configs ✓ |
| `extract*` | 110 | Frontmatter, metadata extraction ✓ |

---

#### pkg/cli (173 files)

**Major Feature Clusters:**

<details>
<summary><b>Command Families</b></summary>

``````
Each major command has dedicated file group:

Compile Command (11 files):
- compile_command.go, compile_helpers.go, compile_config.go
- compile_validation.go, compile_watch.go, compile_orchestration.go
- compile_batch_operations.go, compile_post_processing.go
- compile_output_formatter.go, compile_stats.go

Logs Command (13 files):
- logs_command.go, logs_orchestrator.go, logs_download.go
- logs_parsing_*.go (4 files) - Engine-specific parsing
- logs_github_api.go, logs_display.go, logs_report.go
- logs_metrics.go, logs_cache.go, logs_utils.go, logs_models.go

Add Command (8 files):
- add_command.go, add_workflow_*.go (3 files)
- add_interactive_*.go (6 files) - Interactive flow steps

Run Command (6 files):
- run_command.go, run_workflow_execution.go
- run_workflow_tracking.go, run_workflow_validation.go
- run_interactive.go, run_push.go

Update Command (9 files):
- update_command.go, update_check.go, update_display.go
- update_workflows.go, update_actions.go, update_git.go
- update_merge.go, update_types.go, update_extension_check.go

MCP Commands (17 files):
- mcp.go, mcp_add.go, mcp_list.go, mcp_list_tools.go
- mcp_inspect*.go (5 files), mcp_registry*.go (3 files)
- mcp_config_file.go, mcp_schema.go, mcp_secrets.go
- mcp_server.go, mcp_tool_table.go, mcp_validation.go
- mcp_workflow_*.go (2 files)
``````
</details>

<details>
<summary><b>Codemod Family (15 files)</b></summary>

``````
codemod_*.go files for YAML transformations:
- codemod_agent_session.go, codemod_discussion_flag.go
- codemod_grep_tool.go, codemod_mcp_mode_to_type.go
- codemod_mcp_network.go, codemod_network_firewall.go
- codemod_permissions.go, codemod_safe_inputs.go
- codemod_sandbox_agent.go, codemod_schedule.go
- codemod_schema_file.go, codemod_slash_command.go
- codemod_timeout_minutes.go, codemod_upload_assets.go
- codemod_yaml_utils.go - Shared utilities

Organization: Good - each migration in separate file
``````
</details>

<details>
<summary><b>Consolidated Utilities</b></summary>

``````
Well-organized helper files:
- git.go (746 LOC) - All git operations ✓ Excellent consolidation
- validators.go - Validation functions ✓
- interactive.go (510 LOC) - Interactive prompts ✓
- jq.go, repo.go, resolver.go, spec.go - Single-purpose utilities ✓
``````
</details>

---

#### pkg/parser (32 files)

**Well-Organized Clusters:**

<details>
<summary><b>Parser Clusters</b></summary>

``````
frontmatter_*.go (4 files):
- frontmatter.go (113 bytes - marker file)
- frontmatter_content.go (9.3 KB)
- frontmatter_hash.go (18 KB)

import_*.go (5 files):
- import_cache.go, import_directive.go, import_error.go
- import_processor.go (30 KB), include_processor.go

schedule_*.go (4 files):
- schedule_parser.go, schedule_cron_detection.go
- schedule_fuzzy_scatter.go, schedule_time_utils.go

schema_*.go (7 files):
- schema_compiler.go, schema_validation.go
- schema_deprecation.go, schema_errors.go
- schema_suggestions.go, schema_triggers.go
- schema_utilities.go

yaml_*.go (2 files):
- yaml_error.go, yaml_import.go

Other:
- github.go, github_urls.go
- mcp.go (28 KB)
- remote_fetch.go, tools_merger.go
- content_extractor.go, json_path_locator.go

Organization: Excellent feature clustering ✓

Identified Issues

1. Naming Inconsistency - Safe Outputs Config Files

Issue: Dual naming conventions for configuration files

Files Affected:

  • pkg/workflow/safe_outputs_config.go (388 lines) - plural
  • pkg/workflow/safe_output_config.go (26 lines) - singular
  • pkg/workflow/safe_output_validation_config.go (148 lines) - singular
  • pkg/workflow/compiler_safe_outputs_config.go (513 lines) - plural

Analysis:

// Plural form - Runtime configuration
safe_outputs_config.go:
  func extractSafeOutputsConfig() [Compiler receiver]

// Singular form - Base configuration  
safe_output_config.go:
  func parseBaseSafeOutputConfig() [Compiler receiver]

// Singular form - Validation configuration
safe_output_validation_config.go:
  func validateSafeOutputsTarget()
``````

**Recommendation:** Standardize on **plural form** (`safe_outputs_*`) for consistency:
- Rename `safe_output_config.go` → `safe_outputs_base_config.go`
- Rename `safe_output_validation_config.go` → `safe_outputs_validation_config.go`
- Update all references

**Impact:** Low - affects 3 files, minimal breaking changes

---

### 2. Test File Naming Inconsistency - Compile vs Compiler

**Issue:** Test files use `compile_outputs_*` instead of `compiler_outputs_*`

**Files Affected:**
- `pkg/workflow/compile_config_test.go`
- `pkg/workflow/compile_outputs_allowed_labels_test.go`
- `pkg/workflow/compile_outputs_comment_test.go`
- `pkg/workflow/compile_outputs_issue_test.go`
- `pkg/workflow/compile_outputs_label_test.go`
- `pkg/workflow/compile_outputs_pr_test.go`

**Pattern Inconsistency:**
``````
✓ compiler_*.go (66 implementation files)
✗ compile_outputs_*_test.go (6 test files)
  
Should be: compiler_outputs_*_test.go
``````

**Recommendation:** Rename for consistency with compiler family:
- `compile_outputs_*_test.go` → `compiler_outputs_*_test.go`

**Impact:** Very low - test files only, no runtime changes

---

### 3. Large Module - Safe Outputs (60 files)

**Issue:** The safe_outputs module contains 60 files in a flat structure

**Breakdown:**
``````
safe_outputs_config*.go (7 files) - Configuration
safe_outputs_steps.go, safe_outputs_jobs.go - Builders
safe_outputs_env.go - Environment handling
safe_outputs_domains_validation.go, safe_outputs_target_validation.go - Validation
safe_outputs_app.go - Application integration
+ 50 more related files
``````

**Recommendation:** Consider creating sub-package structure:
``````
pkg/workflow/safe_outputs/
  ├── config/          (configuration files)
  ├── builders/        (steps, jobs builders)
  ├── validation/      (domain, target validation)
  └── integration/     (app, env integration)

Benefits:

  • Clearer organization for large module
  • Easier navigation and discovery
  • Better encapsulation of sub-concerns

Impact: Medium - requires significant refactoring but improves long-term maintainability


4. Documentation Marker Files

Issue: Some files are documentation-only and might confuse developers

Files:

  • pkg/workflow/safe_outputs.go (435 bytes, 8 lines)
  • pkg/parser/frontmatter.go (113 bytes)

Example - safe_outputs.go:

package workflow

// This file serves as documentation for the safe_outputs_* module organization.
// - safe_outputs_config.go: Configuration parsing and validation
// - safe_outputs_steps.go: Step builders
// - safe_outputs_env.go: Environment variable helpers
// - safe_outputs_jobs.go: Job assembly and orchestration
``````

**Recommendation:** 
1. Add prominent `// DOCUMENTATION ONLY - See files listed below` comment at top
2. Consider renaming to `safe_outputs_doc.go` to make purpose explicit
3. Or remove and move documentation to package doc.go

**Impact:** Very low - documentation clarity improvement

---

### 5. Missing Handler Implementation Files

**Issue:** Handler config tests exist without obvious implementation files

**Test Files:**
- `pkg/workflow/create_issue_handler_config_test.go`
- `pkg/workflow/create_project_status_update_handler_config_test.go`
- `pkg/workflow/update_project_handler_config_test.go`
- `pkg/workflow/safe_outputs_handler_manager_token_test.go`

**Expected Pattern:**
``````
✓ create_issue.go + create_issue_test.go (exists)
✗ create_issue_handler_config.go (missing)
  + create_issue_handler_config_test.go (exists)

Analysis: Handler config logic may be embedded in main implementation files or generated. Verify if dedicated handler files should exist.

Recommendation:

  • If handler config is substantial, extract to dedicated *_handler_config.go files
  • If trivial, document that handler tests cover embedded config logic

Impact: Low - primarily affects code discoverability


No Significant Duplication Detected

Sanitize Functions Analysis:

Reviewed similar function names across codebase:

// Different purposes - not duplicates ✓
pkg/parser/import_cache.go:sanitizePath()
  Purpose: Path sanitization for cache keys

pkg/cli/compile_config.go:sanitizeValidationResults()
  Purpose: Remove secrets from validation messages

pkg/cli/mcp_server.go:sanitizeForLog()
  Purpose: Prevent log injection attacks

pkg/workflow/compiler_yaml_main_job.go:sanitizeRefForPath()
  Purpose: Convert git refs to safe path components

Parse/Extract Functions Analysis:

// Reuse common parser functions - good architecture ✓
pkg/cli/codemod_yaml_utils.go:parseFrontmatterLines()
  → Calls parser.ExtractFrontmatterFromContent()

pkg/workflow/compiler_orchestrator_frontmatter.go:parseFrontmatterSection()
  → Calls parser.ExtractFrontmatterFromContent()

Common implementation in pkg/parser - no duplication

Conclusion: Function naming similarities reflect domain language, not code duplication. Shared logic properly abstracted to pkg/parser package.


Refactoring Recommendations

Priority 1: Quick Wins (2-4 hours)

1.1 Standardize Safe Outputs Naming

  • Rename safe_output_config.gosafe_outputs_base_config.go
  • Rename safe_output_validation_config.gosafe_outputs_validation_config.go
  • Update imports and references
  • Run tests to verify no breakage

Files: 3 renames, ~10 import updates
Risk: Low
Benefit: Consistent naming across module

1.2 Rename Compile Test Files

  • Rename compile_outputs_*_test.gocompiler_outputs_*_test.go (6 files)
  • Update any references in build scripts

Files: 6 test files
Risk: Very low (tests only)
Benefit: Consistency with compiler family naming

1.3 Clarify Documentation Files

  • Add // DOCUMENTATION ONLY comment to safe_outputs.go
  • Add // DOCUMENTATION ONLY comment to frontmatter.go
  • Or rename to *_doc.go pattern

Files: 2 documentation files
Risk: None
Benefit: Reduce developer confusion


Priority 2: Medium-Term Improvements (1-2 weeks)

2.1 Investigate Handler Config Pattern

  • Review handler config test files
  • Determine if dedicated *_handler_config.go files should exist
  • Extract handler config if logic is substantial (>100 lines)
  • Document pattern in architecture docs

Files: 4 test files, potentially 4 new implementation files
Risk: Low
Benefit: Clearer handler configuration separation

2.2 Consider MCP Sub-Packaging

  • Evaluate if 41 mcp_* files warrant pkg/workflow/mcp/ sub-package
  • Group into: config/, gateway/, integrations/
  • Update imports across codebase

Files: 41 MCP files
Risk: Medium (many import updates)
Benefit: Better organization for large module


Priority 3: Long-Term Architectural (1-2 months)

3.1 Modularize Safe Outputs

  • Create pkg/workflow/safe_outputs/ sub-package
  • Organize into: config/, builders/, validation/, integration/
  • Move 60 files to appropriate sub-packages
  • Update all imports (affects ~100+ files)
  • Comprehensive testing

Files: 60 safe_outputs files + 100+ import updates
Risk: High (major refactoring)
Benefit: Scalable architecture for large module

3.2 Consider Compiler Sub-Packaging

  • Evaluate if 66 compiler_* files warrant sub-package
  • Potential structure: pkg/workflow/compiler/ with orchestrator/, yaml/, validation/
  • Major undertaking - plan carefully

Files: 66 compiler files
Risk: High
Benefit: Clearer boundaries for compilation subsystem


Implementation Checklist

Phase 1: Quick Wins

  • Review and approve Priority 1 recommendations
  • Create feature branch: refactor/naming-consistency
  • Implement safe_outputs naming standardization
  • Implement compile test file renaming
  • Add documentation-only comments
  • Run full test suite
  • Create PR with detailed changelist

Phase 2: Medium-Term

  • Research handler config pattern requirements
  • Design handler config extraction approach (if needed)
  • Evaluate MCP sub-packaging benefits vs. costs
  • Prototype MCP sub-package structure
  • Gather team feedback on proposals

Phase 3: Long-Term

  • Create RFC for safe_outputs modularization
  • Design package structure with clear interfaces
  • Implement in stages with comprehensive testing
  • Monitor for performance regressions
  • Document architectural decisions

Analysis Metadata

Scope:

  • Total Go Files Analyzed: 462 (non-test files in pkg/)
  • Total Functions Cataloged: ~3,500+
  • Function Clusters Identified: 25+ major feature groups
  • Outliers Found: 5 naming inconsistencies
  • Duplicates Detected: 0 (all similar functions serve different purposes)

Methodology:

  • Detection Method: Explore agent semantic analysis + manual pattern review
  • Analysis Date: 2026-02-02
  • Repository: githubnext/gh-aw
  • Workflow Run: §21595357786

Code Quality Assessment:

  • Organization: 8/10 - Excellent feature clustering with minor inconsistencies
  • Naming Consistency: 7/10 - Good patterns with 3-5 exceptions
  • Modularity: 7/10 - Well-separated concerns, opportunity for sub-packaging
  • Documentation: 8/10 - Clear file organization, minimal marker file confusion

Overall: The codebase demonstrates strong architectural discipline with clear feature-based organization. Identified issues are minor and represent opportunities for incremental improvement rather than systemic problems.

AI generated by Semantic Function Refactoring

  • expires on Feb 4, 2026, 3:15 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions