[refactor] Semantic Function Clustering Analysis - Code Organization Opportunities

*Semantic analysis of Go codebase to identify refactoring opportunities through function clustering, outlier detection, and code organization patterns*

### Executive Summary

Analyzed **483 Go source files** across the `pkg/` directory (247 in workflow, 160 in CLI, 76 in supporting packages). The codebase demonstrates strong semantic organization with consistent naming conventions, but several high-impact refactoring opportunities exist:

- **Large monolithic files**: 10+ files exceed 750 lines with mixed responsibilities
- **Helper file accumulation**: 13 helper files with varying levels of cohesion
- **Factory method concentration**: Files like `permissions.go` (928 lines, 36+ factory methods)
- **Validation pattern duplication**: 76+ similar validation patterns across 32 files
- **Configuration parsing patterns**: Multiple parse*Config functions with similar logic

**Impact**: Improved maintainability, testability, and reduced cognitive overhead for developers.

---

### Function Inventory by Package

#### Primary Packages Analysis

| Package | Files | Lines (Est.) | Primary Purpose | Organization Quality |
|---------|-------|--------------|-----------------|---------------------|
| **workflow** | 247 | 117K+ | Core compilation/execution engine | ⚠️ Some large files need splitting |
| **cli** | 160 | 38K+ | Command-line interface | ⚠️ 4 files exceed 1000 lines |
| **parser** | 30 | Unknown | Parsing utilities | ✅ Well organized |
| **campaign** | 14 | Unknown | Campaign management | ✅ Focused scope |
| **console** | 11 | Unknown | UI/Output rendering | ✅ Clean separation |

#### Utility Packages (Well Organized ✅)

Small, focused packages with clear responsibilities:
- **stringutil** (5 files), **logger** (4 files), **types** (2 files)
- Single-file utilities: tty, timeutil, testutil, styles, sliceutil, repoutil, mathutil, gitutil, envutil, constants

---

### Clustering Results

The codebase follows **semantic underscore-separated naming** with clear verb-prefix patterns:

#### File Naming Patterns (Semantic Grouping)

| Pattern | Count | Purpose | Organization |
|---------|-------|---------|--------------|
| `compiler_*` | 26 | Compilation orchestration | ✅ Clear grouping |
| `safe_outputs*` | 20 | Safety boundary outputs | ✅ Well organized |
| `mcp_*` | 16 | Model Context Protocol | ✅ Cohesive cluster |
| `create_*` | 8 | GitHub entity creation | ✅ One per entity type |
| `update_*` | 7 | GitHub entity updates | ✅ Parallel to create_* |
| `*_validation.go` | 31 | Validation logic | ✅ Separated from main logic |
| `*_helpers.go` | 13 | Helper functions | ⚠️ Varying cohesion |
| `*_config*.go` | Multiple | Configuration parsing | ⚠️ Some overlap |

#### Function Naming Patterns

**Workflow Package - Top Patterns:**
````
59 parse*        (parseWorkflow, parseExpression, parseIssuesConfig)
56 get*          (getPermissions, getRequiredSecrets, getAllowedDomains)
50 generate*     (generateWorkflowHeader, generateSafeOutputsConfig)
37 extract*      (extractExpressions, extractSecrets, extractMetadata)
34 validate*     (validateExpression, validateSandbox, validateRuntimeMode)
32 is*           (isAllowed, isEnabled, isValid)
24 build*        (buildJob, buildStep, buildCopilotAssignmentStep)
14 render*       (renderYAML, renderSharedMCPConfig, renderGitHubMCP)
```

**CLI Package - Top Patterns:**
```
64 get*          (getConfig, getWorkflows, getStatus)
29 parse*        (parseFlags, parseWorkflow)
27 extract*      (extractLogs, extractMetrics)
25 ensure*       (ensureDirectory, ensureConfig)
20 run*          (runWorkflow, runCommand)
19 create*       (createCommand, createPR, createIssue)
````

**Observation**: Consistent camelCase for unexported functions, PascalCase for exported (idiomatic Go). No mixed patterns.

---

### Identified Issues

### 1. Large Monolithic Files (Refactoring Priority: High)

Files with multiple semantic concerns that should be split:

<details>
<summary><b>Top 10 Largest Files</b></summary>

| Rank | File | Lines | Package | Functions | Issues |
|------|------|-------|---------|-----------|--------|
| 1 | `add_command.go` | 1,218 | cli | 18 | Combines workflow resolution, UI, addition, compilation, PR creation, file tracking |
| 2 | `add_interactive.go` | 1,025 | cli | 23 | Interactive UI + workflow operations + file operations mixed |
| 3 | `trial_command.go` | 1,000 | cli | Unknown | Trial execution command |
| 4 | `mcp_server.go` | 1,000 | cli | Unknown | MCP server management |
| 5 | `safe_outputs_config_generation.go` | 978 | workflow | 5 | Configuration generation from workflow data |
| 6 | `permissions.go` | 928 | workflow | 36 | **36 factory methods** + parsing + operators |
| 7 | `init.go` | 868 | cli | Unknown | Initialization logic |
| 8 | `audit.go` | 864 | cli | Unknown | Audit trail generation |
| 9 | `logs_orchestrator.go` | 855 | cli | 5 | Downloading + normalization + filtering + parsing |
| 10 | `mcp_renderer.go` | 844 | workflow | 4 | Renders GitHub/Playwright/custom MCPs |

</details>

#### Detailed Examples

**Example 1: `pkg/cli/add_command.go` (1,218 lines)**

**Issue**: Single file handles 6 distinct responsibilities

Functional areas:
1. Command definition & flag configuration (203 lines)
2. Workflow resolution & parsing (130 lines)
3. Repository listing & interactive selection (150 lines)
4. Normal workflow addition (101 lines)
5. PR creation flow (105 lines)
6. Single workflow addition with tracking (262 lines - **complexity hotspot**)
7. Compilation helpers (190 lines)

**Recommendation**: Split into 6 focused modules:
- `add_command.go` - Command definition (~250 lines)
- `add_workflow_resolution.go` - Workflow resolution (~200 lines)
- `add_workflow_repository.go` - Repository operations (~150 lines)
- `add_workflow_operations.go` - Core addition logic (~400 lines)
- `add_workflow_pr.go` - PR flow (~110 lines)
- `add_workflow_compilation.go` - Compilation helpers (~150 lines)

**Impact**: Easier testing, clearer responsibilities, reduced cognitive load

**Note**: Issue [#12263](https://github.com/githubnext/gh-aw/issues/12263) already tracks this refactoring with a detailed plan.

---

**Example 2: `pkg/workflow/permissions.go` (928 lines, 36+ functions)**

**Issue**: Factory method accumulation - 36+ `New*Permission()` functions

Functions:
- 5 parsing methods
- 36+ factory methods (`NewContentsReadPermission`, `NewIssuesWritePermission`, etc.)
- 8+ operator methods (merge, render, validate)

**Recommendation**: Split into 3 semantic modules:
- `permissions_parser.go` - Parsing logic (5 methods)
- `permissions_factory.go` - Factory methods (36+ New* functions)
- `permissions_operations.go` - Operators (merge, render, validate)

**Estimated Impact**: Improved discoverability, easier to locate specific permission types

---

**Example 3: `pkg/workflow/mcp_renderer.go` (844 lines)**

**Issue**: Multi-target renderer in single file

Renders:
- GitHub MCP config
- Playwright MCP config
- Custom tool MCP config
- Shared MCP infrastructure

**Recommendation**: Split by target:
- `mcp_renderer.go` - Base renderer & shared logic
- `mcp_renderer_github.go` - GitHub-specific rendering
- `mcp_renderer_playwright.go` - Playwright-specific rendering
- `mcp_renderer_custom.go` - Custom tool rendering

**Estimated Impact**: Parallel development, clearer boundaries, easier testing

---

### 2. Helper File Organization (Refactoring Priority: Medium)

**13 helper files** identified with varying levels of cohesion:

<details>
<summary><b>Helper Files Analysis</b></summary>

#### Well-Organized Helper Files ✅

| File | Functions | Cohesion | Assessment |
|------|-----------|----------|------------|
| `validation_helpers.go` | 7 | High | Generic validation patterns |
| `config_helpers.go` | 10 | High | Configuration parsing utilities |
| `error_helpers.go` | 5 | High | Error construction helpers |
| `git_helpers.go` | 1 | High | Git tag extraction |
| `map_helpers.go` | 2 | High | Map manipulation utilities |

#### Candidates for Reorganization ⚠️

| File | Functions | Issue | Recommendation |
|------|-----------|-------|----------------|
| `engine_helpers.go` | 10 | Mixed: installation + env filtering + path setup | Split: `engine_installation.go`, `engine_environment.go` |
| `compiler_yaml_helpers.go` | 4+ | Mixed: path extraction + code generation + placeholders | Consider moving to main compiler file |
| `safe_outputs_config_generation_helpers.go` | 10 | All related to config generation | ✅ Could stay together or merge with parent |

</details>

**Pattern**: Helper files work well when they contain **generic, reusable utilities** (validation, error handling, parsing). They become problematic when they accumulate **domain-specific logic** that belongs in main modules.

---

### 3. Validation Pattern Duplication (Refactoring Priority: Medium)

**76+ similar validation patterns** across 32 validation files

#### Common Patterns

| Pattern | Occurrences (Est.) | Files | Example Functions |
|---------|-------------------|-------|-------------------|
| Integer range validation | 15+ | 12 files | `validateIntRange(value, min, max, fieldName)` |
| Required field validation | 20+ | 18 files | String empty checks |
| List membership validation | 12+ | 9 files | `value in allowedValues` |
| String length validation | 10+ | 8 files | `len(str) < minLength` |
| File existence checks | 8+ | 6 files | `os.Stat(path)` patterns |

#### Existing Solution (Partial) ✅

`validation_helpers.go` already consolidates some patterns:
- `validateIntRange()` - Integer range validation
- `ValidateRequired()` - Required field validation
- `ValidateMaxLength()` / `ValidateMinLength()` - Length validation
- `ValidateInList()` - List membership
- `ValidatePositiveInt()` / `ValidateNonNegativeInt()` - Integer validation
- `fileExists()` - File existence check

**Note**: The file includes TODO comments indicating Phase 2 refactoring to add:
- `isEmptyOrNil()` - Empty/nil checks
- `getMapFieldAsString()` / `getMapFieldAsMap()` / `getMapFieldAsBool()` - Map field extraction
- `dirExists()` - Directory existence check

**Recommendation**: Continue consolidation effort. Migrate remaining validation patterns to use these helpers across the 32 validation files.

**Estimated Impact**: Reduced duplication, consistent validation behavior, easier testing

---

### 4. Configuration Parsing Patterns (Refactoring Priority: Low-Medium)

**Similar parse*Config functions** with overlapping logic

#### Identified Patterns

| Function Pattern | Occurrences | Files | Example |
|------------------|-------------|-------|---------|
| `parse*Config(configMap map[string]any)` | 14+ | 8 files | `parseLabelsFromConfig`, `parseMessagesConfig`, `parseAppConfig` |
| `parse*FromConfig(configMap, key)` | 10+ | 5 files | `parseExpiresFromConfig`, `parseParticipantsFromConfig` |
| Array parsing | 8+ | 4 files | String array extraction with type checking |
| String extraction | 12+ | 6 files | String value extraction with validation |

#### Existing Solution (Partial) ✅

`config_helpers.go` provides generic parsing helpers:
- `ParseStringArrayFromConfig(m, key, log)` - Generic string array extraction
- `extractStringFromMap(m, key, log)` - Generic string extraction
- `ParseIntFromConfig(m, key, log)` - Generic integer extraction
- `ParseBoolFromConfig(m, key, log)` - Generic boolean extraction
- `unmarshalConfig(m, key, target, log)` - Type-safe YAML unmarshaling

**Observation**: Many specific parse functions are thin wrappers around these generic helpers:

```go
// Thin wrapper pattern (appropriate)
func parseLabelsFromConfig(configMap map[string]any) []string {
    return ParseStringArrayFromConfig(configMap, "labels", configHelpersLog)
}

func parseTitlePrefixFromConfig(configMap map[string]any) string {
    return extractStringFromMap(configMap, "title-prefix", configHelpersLog)
}
```

**Assessment**: Current organization is reasonable. Thin wrappers provide semantic clarity while avoiding duplication. No major refactoring needed.

---

### 5. Error Message Patterns (Refactoring Priority: Low)

**Consistent error wrapping patterns** across files

| Error Pattern | Occurrences | Files |
|---------------|-------------|-------|
| `fmt.Errorf("failed to %s: %w", ...)` | 114 | 37 files |
| `fmt.Errorf("could not %s: %w", ...)` | 0 | - |
| `fmt.Errorf("unable to %s: %w", ...)` | 1 | 1 file |

**Observation**: Codebase consistently uses `"failed to"` pattern (114 occurrences). This is **good consistency**.

#### Existing Solution ✅

`error_helpers.go` provides structured error constructors:
- `NewValidationError(field, value, reason, suggestion)`
- `NewOperationError(operation, entityType, entityID, cause, suggestion)`
- `NewConfigurationError(configKey, value, reason, suggestion)`
- `EnhanceError(err, context, suggestion)`
- `WrapErrorWithContext(err, context, suggestion)`

**Assessment**: Error handling is well-structured with helper utilities. No major refactoring needed.

---

### Detailed Function Clusters

#### Cluster 1: Creation Functions (`create_*` pattern)

**Pattern**: GitHub entity creation handlers - one file per entity type

| File | Functions | Organization |
|------|-----------|--------------|
| `create_issue.go` | CreateIssue(...) | ✅ Focused |
| `create_pull_request.go` | CreatePR(...) | ✅ Focused |
| `create_discussion.go` | CreateDiscussion(...) | ✅ Focused |
| `create_project.go` | CreateProject(...) | ✅ Focused |
| `create_project_status_update.go` | CreateProjectStatusUpdate(...) | ✅ Focused |

**Analysis**: ✅ Well-organized - Each creation function has its own file with clear responsibility. This pattern should be maintained.

---

#### Cluster 2: Update Functions (`update_*` pattern)

**Pattern**: GitHub entity modification handlers - parallel to create_* pattern

| File | Functions | Organization |
|------|-----------|--------------|
| `update_issue.go` | UpdateIssue(...) | ✅ Focused |
| `update_pull_request.go` | UpdatePR(...) | ✅ Focused |
| `update_discussion.go` | UpdateDiscussion(...) | ✅ Focused |
| `update_project.go` | UpdateProject(...) | ✅ Focused |

**Analysis**: ✅ Well-organized - Mirrors create_* pattern consistently.

---

#### Cluster 3: Validation Functions (`validate*` pattern)

**Pattern**: 31 validation files with `*_validation.go` suffix

**Organization**: ✅ Validation logic consistently separated from main logic

Examples:
- `expression_validation.go` - Expression safety validation
- `sandbox_validation.go` - Sandbox configuration validation
- `template_validation.go` - Template injection validation
- `runtime_validation.go` - Runtime mode validation
- `permissions_validation.go` - Permission validation
- `features_validation.go` - Feature flag validation

**Analysis**: ✅ Excellent pattern - Validation logic is isolated for easier testing and maintenance. Continue this pattern.

---

#### Cluster 4: Compilation Functions (`compiler_*` pattern)

**Pattern**: 26 compiler-related files

**Organization**: ⚠️ Some overlap between files

Examples:
- `compiler.go` - Main orchestration
- `compiler_yaml.go` - YAML generation
- `compiler_jobs.go` - Job compilation
- `compiler_safe_outputs.go` - Safe outputs compilation
- `compiler_activation_jobs.go` - Activation job generation
- `compiler_orchestrator_*.go` - Multiple orchestrator files

**Observation**: 26 compiler files suggest **deep hierarchy**. Some boundaries may be unclear.

**Recommendation**: Document the compilation pipeline architecture to clarify file responsibilities. Consider whether some `compiler_orchestrator_*` files could be consolidated.

---

#### Cluster 5: MCP Configuration (`mcp_*` pattern)

**Pattern**: 16 MCP-related files

| File | Purpose | Organization |
|------|---------|--------------|
| `mcp_config.go` | Main configuration | ✅ |
| `mcp_renderer.go` | Rendering (844 lines) | ⚠️ Multi-target |
| `mcp_github_config.go` | GitHub-specific config | ✅ |
| `mcp_playwright_config.go` | Playwright-specific config | ✅ |
| `mcp_config_custom.go` | Custom tool config | ✅ |
| `mcp_config_validation.go` | Validation | ✅ |
| Others | Setup, environment, gateway | ✅ |

**Analysis**: Generally well-organized. Main issue is `mcp_renderer.go` (844 lines) rendering multiple targets - candidate for splitting (see Issue #1 above).

---

### Refactoring Recommendations

#### Priority 1: High Impact (Immediate Action)

##### 1.1 Split Large Monolithic Files (Effort: Large, Impact: High)

**Target files**: `add_command.go` (1,218 lines), `add_interactive.go` (1,025 lines), `trial_command.go` (1,000 lines), `mcp_server.go` (1,000 lines)

**Approach**:
1. Identify semantic boundaries (already documented for `add_command.go` in [#12263](https://github.com/githubnext/gh-aw/issues/12263))
2. Split into focused modules (6-8 files per large file)
3. Maintain public API exports
4. Add tests for each new module

**Benefits**:
- Easier testing (smaller test files focused on specific functionality)
- Reduced cognitive load (developers work on smaller, focused files)
- Parallel development (multiple developers can work on different modules)
- Clearer responsibilities (each file has single, clear purpose)

**Estimated Effort**: 2-3 days per file (including tests)

##### 1.2 Split Factory Method Files (Effort: Medium, Impact: High)

**Target file**: `permissions.go` (928 lines, 36+ factory methods)

**Approach**:
1. Split into 3 files:
   - `permissions_parser.go` - Parsing logic (5 methods)
   - `permissions_factory.go` - Factory methods (36+ New* functions)
   - `permissions_operations.go` - Operators (merge, render, validate)
2. Keep main `permissions.go` for types and core logic

**Benefits**:
- Improved discoverability (easier to find specific permission types)
- Faster file navigation
- Clearer file purposes

**Estimated Effort**: 4-6 hours

---

#### Priority 2: Medium Impact (Planned Work)

##### 2.1 Continue Validation Helper Consolidation (Effort: Medium, Impact: Medium)

**Current state**: `validation_helpers.go` exists with 7+ helper functions. TODO comments indicate Phase 2 work.

**Approach**:
1. Implement Phase 2 helpers:
   - `isEmptyOrNil()` - Empty/nil/zero checks
   - `getMapFieldAsString()` / `getMapFieldAsMap()` / `getMapFieldAsBool()` / `getMapFieldAsInt()` - Safe map field extraction
   - `dirExists()` - Directory existence check
2. Migrate existing validation patterns to use these helpers across 32 validation files
3. Update tests to cover new helpers

**Benefits**:
- Reduced duplication (76+ patterns → centralized helpers)
- Consistent validation behavior
- Easier testing (test helpers once, use everywhere)

**Estimated Effort**: 1-2 days

##### 2.2 Split Multi-Target Renderers (Effort: Medium, Impact: Medium)

**Target file**: `mcp_renderer.go` (844 lines)

**Approach**:
1. Split by target:
   - `mcp_renderer.go` - Base renderer & shared logic
   - `mcp_renderer_github.go` - GitHub-specific rendering
   - `mcp_renderer_playwright.go` - Playwright-specific rendering
   - `mcp_renderer_custom.go` - Custom tool rendering
2. Use composition/interfaces for shared functionality

**Benefits**:
- Parallel development (different developers can work on different renderers)
- Clearer testing boundaries
- Easier to add new MCP targets

**Estimated Effort**: 6-8 hours

---

#### Priority 3: Long-term Improvements (Future Work)

##### 3.1 Document Compiler Architecture (Effort: Low, Impact: Medium)

**Issue**: 26 compiler files suggest deep hierarchy; some boundaries may be unclear

**Approach**:
1. Create `pkg/workflow/COMPILER_ARCHITECTURE.md` documenting:
   - Compilation pipeline stages
   - File responsibilities
   - Data flow between files
   - Extension points
2. Add package-level documentation to key compiler files

**Benefits**:
- Onboarding (new developers understand compilation pipeline)
- Maintenance (clearer boundaries prevent responsibility drift)

**Estimated Effort**: 4-6 hours

##### 3.2 Helper File Reorganization (Effort: Medium, Impact: Low-Medium)

**Target files**: `engine_helpers.go` (mixed concerns), `compiler_yaml_helpers.go` (mixed concerns)

**Approach**:
1. Analyze helper functions for cohesion
2. Move domain-specific logic to appropriate main modules
3. Keep generic utilities in helper files

**Benefits**:
- Clearer helper file purposes
- Reduced cognitive load when choosing where to add new functions

**Estimated Effort**: 4-8 hours per file

---

### Implementation Guidelines

#### General Principles

1. **Preserve Behavior**: Ensure all existing functionality works identically
2. **Maintain Exports**: Keep public API unchanged (avoid breaking changes)
3. **Add Tests First**: Write tests for each new module before refactoring (test-driven refactoring)
4. **Incremental Changes**: Split one module at a time, verify tests pass after each split
5. **Run Tests Frequently**: Verify `make test-unit` passes after each split
6. **Update Imports**: Ensure all import paths are correct
7. **Document Changes**: Add comments explaining module boundaries and responsibilities

#### File Splitting Strategy

When splitting large files:

1. **Identify semantic boundaries** (functional areas with clear responsibilities)
2. **Create new files** with descriptive names following existing naming conventions
3. **Move functions** to new files, keeping related functions together
4. **Update imports** in all affected files
5. **Run tests** to verify no breakage
6. **Add new tests** for newly created modules
7. **Document** file purposes in package comments

#### Success Metrics

- ✅ All tests pass (`make test-unit`, `make test-integration`)
- ✅ No breaking changes to public API
- ✅ Code passes linting (`make lint`)
- ✅ Build succeeds (`make build`)
- ✅ New files are under 500 lines each
- ✅ Test coverage is maintained or improved (≥80% for new modules)

---

### Organizational Structure Summary

#### Strengths ✅

1. **Clear semantic grouping** - Files named by concept, not by type (good Go practice)
2. **Consistent naming conventions** - Verb-prefix patterns uniform across packages
3. **Specialized utility packages** - Focused packages for common concerns (stringutil, logger, etc.)
4. **Validation separation** - Validation logic consistently in `*_validation.go` files
5. **Configuration clarity** - Configuration parsing clearly marked with `*_config*.go`
6. **Create/Update parallelism** - Parallel `create_*` and `update_*` patterns for GitHub entities

#### Weaknesses & Opportunities ⚠️

1. **Large monolithic files** - 10+ files exceed 750 lines with mixed responsibilities
2. **Factory method accumulation** - Files like `permissions.go` with 36+ factory methods
3. **Multi-concern files** - Files combining UI, operations, and compilation logic
4. **Helper file over-use** - Some helper files contain domain-specific logic
5. **Deep compiler hierarchy** - 26 compiler files; unclear boundaries between some

---

### Analysis Metadata

- **Total Go Files Analyzed**: 483 (non-test files in pkg/)
- **Total Functions Cataloged**: 1,500+ (estimated)
- **Function Clusters Identified**: 8 major clusters (compiler, validation, safe_outputs, mcp, create, update, parse, build)
- **Outliers Found**: 10 large files (>750 lines) with multiple responsibilities
- **Large Files Detected**: 10 files >750 lines (top file: 1,218 lines)
- **Helper Files Analyzed**: 13 files
- **Validation Files**: 31 files with consistent `*_validation.go` pattern
- **Detection Method**: Pattern matching + semantic analysis + file reading
- **Analysis Date**: 2026-01-28
- **Workflow Run**: [§21443298878](https://github.com/githubnext/gh-aw/actions/runs/21443298878)

---

### Acceptance Criteria

Implementation is complete when:

- [ ] Top 5 large files (>900 lines) are split into focused modules
- [ ] `permissions.go` is split into parser/factory/operations files
- [ ] `mcp_renderer.go` is split by target (github/playwright/custom)
- [ ] Phase 2 validation helpers are implemented and adopted
- [ ] All tests pass (`make test-unit` and `make test-integration`)
- [ ] Test coverage maintained or improved (≥80% for new modules)
- [ ] No breaking changes to public APIs
- [ ] Code passes linting and builds successfully
- [ ] Compiler architecture documentation added

---

### Related Issues

- [#12263](https://github.com/githubnext/gh-aw/issues/12263) - Detailed plan for refactoring `add_command.go` (already exists)

---

**Priority**: Medium  
**Effort**: Large (multi-week effort across multiple files)  
**Expected Impact**: Significantly improved maintainability, easier testing, reduced complexity, better separation of concerns, faster onboarding for new developers




> AI generated by [Semantic Function Refactoring](https://github.com/githubnext/gh-aw/actions/runs/21443298878)

Package	Files	Lines (Est.)	Primary Purpose	Organization Quality
workflow	247	117K+	Core compilation/execution engine	⚠️ Some large files need splitting
cli	160	38K+	Command-line interface	⚠️ 4 files exceed 1000 lines
parser	30	Unknown	Parsing utilities	✅ Well organized
campaign	14	Unknown	Campaign management	✅ Focused scope
console	11	Unknown	UI/Output rendering	✅ Clean separation

Pattern	Count	Purpose	Organization
`compiler_*`	26	Compilation orchestration	✅ Clear grouping
`safe_outputs*`	20	Safety boundary outputs	✅ Well organized
`mcp_*`	16	Model Context Protocol	✅ Cohesive cluster
`create_*`	8	GitHub entity creation	✅ One per entity type
`update_*`	7	GitHub entity updates	✅ Parallel to create_*
`*_validation.go`	31	Validation logic	✅ Separated from main logic
`*_helpers.go`	13	Helper functions	⚠️ Varying cohesion
`_config.go`	Multiple	Configuration parsing	⚠️ Some overlap

Rank	File	Lines	Package	Functions	Issues
1	`add_command.go`	1,218	cli	18	Combines workflow resolution, UI, addition, compilation, PR creation, file tracking
2	`add_interactive.go`	1,025	cli	23	Interactive UI + workflow operations + file operations mixed
3	`trial_command.go`	1,000	cli	Unknown	Trial execution command
4	`mcp_server.go`	1,000	cli	Unknown	MCP server management
5	`safe_outputs_config_generation.go`	978	workflow	5	Configuration generation from workflow data
6	`permissions.go`	928	workflow	36	36 factory methods + parsing + operators
7	`init.go`	868	cli	Unknown	Initialization logic
8	`audit.go`	864	cli	Unknown	Audit trail generation
9	`logs_orchestrator.go`	855	cli	5	Downloading + normalization + filtering + parsing
10	`mcp_renderer.go`	844	workflow	4	Renders GitHub/Playwright/custom MCPs

File	Functions	Cohesion	Assessment
`validation_helpers.go`	7	High	Generic validation patterns
`config_helpers.go`	10	High	Configuration parsing utilities
`error_helpers.go`	5	High	Error construction helpers
`git_helpers.go`	1	High	Git tag extraction
`map_helpers.go`	2	High	Map manipulation utilities

File	Functions	Issue	Recommendation
`engine_helpers.go`	10	Mixed: installation + env filtering + path setup	Split: `engine_installation.go`, `engine_environment.go`
`compiler_yaml_helpers.go`	4+	Mixed: path extraction + code generation + placeholders	Consider moving to main compiler file
`safe_outputs_config_generation_helpers.go`	10	All related to config generation	✅ Could stay together or merge with parent

Pattern	Occurrences (Est.)	Files	Example Functions
Integer range validation	15+	12 files	`validateIntRange(value, min, max, fieldName)`
Required field validation	20+	18 files	String empty checks
List membership validation	12+	9 files	`value in allowedValues`
String length validation	10+	8 files	`len(str) < minLength`
File existence checks	8+	6 files	`os.Stat(path)` patterns

Function Pattern	Occurrences	Files	Example
`parse*Config(configMap map[string]any)`	14+	8 files	`parseLabelsFromConfig`, `parseMessagesConfig`, `parseAppConfig`
`parse*FromConfig(configMap, key)`	10+	5 files	`parseExpiresFromConfig`, `parseParticipantsFromConfig`
Array parsing	8+	4 files	String array extraction with type checking
String extraction	12+	6 files	String value extraction with validation

Error Pattern	Occurrences	Files
`fmt.Errorf("failed to %s: %w", ...)`	114	37 files
`fmt.Errorf("could not %s: %w", ...)`	0	-
`fmt.Errorf("unable to %s: %w", ...)`	1	1 file

File	Functions	Organization
`create_issue.go`	CreateIssue(...)	✅ Focused
`create_pull_request.go`	CreatePR(...)	✅ Focused
`create_discussion.go`	CreateDiscussion(...)	✅ Focused
`create_project.go`	CreateProject(...)	✅ Focused
`create_project_status_update.go`	CreateProjectStatusUpdate(...)	✅ Focused

File	Functions	Organization
`update_issue.go`	UpdateIssue(...)	✅ Focused
`update_pull_request.go`	UpdatePR(...)	✅ Focused
`update_discussion.go`	UpdateDiscussion(...)	✅ Focused
`update_project.go`	UpdateProject(...)	✅ Focused

File	Purpose	Organization
`mcp_config.go`	Main configuration	✅
`mcp_renderer.go`	Rendering (844 lines)	⚠️ Multi-target
`mcp_github_config.go`	GitHub-specific config	✅
`mcp_playwright_config.go`	Playwright-specific config	✅
`mcp_config_custom.go`	Custom tool config	✅
`mcp_config_validation.go`	Validation	✅
Others	Setup, environment, gateway	✅

[refactor] Semantic Function Clustering Analysis - Code Organization Opportunities #12291

Description

Executive Summary

Function Inventory by Package

Primary Packages Analysis

Utility Packages (Well Organized ✅)

Clustering Results

File Naming Patterns (Semantic Grouping)

Function Naming Patterns

Identified Issues

1. Large Monolithic Files (Refactoring Priority: High)

Detailed Examples

2. Helper File Organization (Refactoring Priority: Medium)

Well-Organized Helper Files ✅

Candidates for Reorganization ⚠️

3. Validation Pattern Duplication (Refactoring Priority: Medium)

Common Patterns

Existing Solution (Partial) ✅

4. Configuration Parsing Patterns (Refactoring Priority: Low-Medium)

Identified Patterns

Existing Solution (Partial) ✅

5. Error Message Patterns (Refactoring Priority: Low)

Existing Solution ✅

Detailed Function Clusters

Cluster 1: Creation Functions (create_* pattern)

Cluster 2: Update Functions (update_* pattern)

Cluster 3: Validation Functions (validate* pattern)

Cluster 4: Compilation Functions (compiler_* pattern)

Cluster 5: MCP Configuration (mcp_* pattern)

Refactoring Recommendations

Priority 1: High Impact (Immediate Action)

1.1 Split Large Monolithic Files (Effort: Large, Impact: High)

1.2 Split Factory Method Files (Effort: Medium, Impact: High)

Priority 2: Medium Impact (Planned Work)

2.1 Continue Validation Helper Consolidation (Effort: Medium, Impact: Medium)

2.2 Split Multi-Target Renderers (Effort: Medium, Impact: Medium)

Priority 3: Long-term Improvements (Future Work)

3.1 Document Compiler Architecture (Effort: Low, Impact: Medium)

3.2 Helper File Reorganization (Effort: Medium, Impact: Low-Medium)

Implementation Guidelines

General Principles

File Splitting Strategy

Success Metrics

Organizational Structure Summary

Strengths ✅

Weaknesses & Opportunities ⚠️

Analysis Metadata

Acceptance Criteria

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Cluster 1: Creation Functions (`create_*` pattern)

Cluster 2: Update Functions (`update_*` pattern)

Cluster 3: Validation Functions (`validate*` pattern)

Cluster 4: Compilation Functions (`compiler_*` pattern)

Cluster 5: MCP Configuration (`mcp_*` pattern)