Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization and Duplication Opportunities #12951

@github-actions

Description

@github-actions

Analysis Overview

This semantic function clustering analysis examined the Go codebase to identify code organization and refactoring opportunities. The analysis focused on the pkg/workflow package, which contains 251 non-test Go files with 379 exported functions.

Key Statistics:

  • Total Go files analyzed: 490 files across all packages
  • Workflow package files: 251 files (largest package)
  • Exported functions in workflow: 379 functions
  • Helper files identified: 12 distinct helper files
  • Common function verb prefixes: Get (66), New (47), Build (34), Generate (20), Parse (17), Validate (14), Extract (14)

Executive Summary

The analysis revealed several high-impact refactoring opportunities:

  1. Multiple helper files with overlapping purposes - 12 helper files with potential consolidation opportunities
  2. Repetitive parsing patterns - 40+ parse*Config functions following identical patterns
  3. Scattered validation functions - 45+ validation functions across 20+ files
  4. File naming inconsistencies - Related functionality split across files with inconsistent naming

Identified Issues

1. Helper File Proliferation

Issue: The workflow package contains 12 separate helper files with overlapping concerns.

Files Identified:

  • close_entity_helpers.go (3 functions)
  • config_helpers.go (12 functions)
  • engine_helpers.go (11 functions)
  • error_helpers.go (6 functions)
  • git_helpers.go (1 function)
  • map_helpers.go (2 functions)
  • prompt_step_helper.go (1 function)
  • safe_outputs_config_generation_helpers.go (10 functions)
  • safe_outputs_config_helpers.go (2 functions)
  • update_entity_helpers.go (2 functions)
  • validation_helpers.go (7 functions)
  • compiler_yaml_helpers.go (4 functions)

Analysis:

Some helper files serve distinct purposes (e.g., engine_helpers.go for engine-specific utilities), but others show potential for consolidation:

View Consolidation Opportunities

Potential Consolidation #1: Entity Operation Helpers

  • close_entity_helpers.go
  • update_entity_helpers.go

These files contain generic parsing functions for entity operations (issues, PRs, discussions). They share similar patterns:

  • Both parse entity configurations from maps
  • Both handle similar fields (body, labels, assignees)
  • Could be unified into entity_operation_helpers.go

Potential Consolidation #2: Safe Outputs Configuration Helpers

  • safe_outputs_config_generation_helpers.go
  • safe_outputs_config_helpers.go

Both files handle safe outputs configuration with overlapping concerns:

  • Configuration generation functions
  • Configuration parsing functions
  • Could be merged into single safe_outputs_config_helpers.go

Potential Consolidation #3: Single-Function Helper Files

  • git_helpers.go (1 function: GetCurrentGitTag)
  • map_helpers.go (2 functions: parseIntValue, filterMapKeys)
  • prompt_step_helper.go (1 function: generateStaticPromptStep)

These small helper files could be:

  • Moved to more specific domain files
  • Consolidated into a general utilities file
  • Example: GetCurrentGitTag could move to a git operations file

Recommendation: Review helper files for consolidation opportunities, focusing on:

  1. Entity operation helpers → unified entity operations
  2. Safe outputs config helpers → single config utilities file
  3. Single-function helpers → move to appropriate domain files

Estimated Impact: Improved code discoverability, reduced file count by 3-5 files


2. Repetitive Config Parsing Pattern

Issue: The codebase contains 40+ parse*Config functions that follow nearly identical patterns.

Pattern Example:

All these functions follow the same structure:

  1. Check if key exists in map
  2. Return nil if not exists
  3. Parse config fields using helper functions
  4. Return config struct
View Examples of Repetitive Parsing Functions

Files with parse*Config functions:

  • add_comment.go: parseCommentsConfig
  • add_labels.go: parseAddLabelsConfig
  • add_reviewer.go: parseAddReviewerConfig
  • assign_milestone.go: parseAssignMilestoneConfig
  • assign_to_agent.go: parseAssignToAgentConfig
  • assign_to_user.go: parseAssignToUserConfig
  • autofix_code_scanning_alert.go: parseAutofixCodeScanningAlertConfig
  • close_entity_helpers.go: parseCloseEntityConfig, parseCloseIssuesConfig, parseClosePullRequestsConfig, parseCloseDiscussionsConfig
  • copy_project.go: parseCopyProjectsConfig
  • create_agent_session.go: parseAgentSessionConfig
  • create_code_scanning_alert.go: parseCodeScanningAlertsConfig
  • create_discussion.go: parseDiscussionsConfig
  • create_issue.go: parseIssuesConfig
  • create_pr_review_comment.go: parsePullRequestReviewCommentsConfig
  • create_project.go: parseCreateProjectsConfig
  • ... and 25+ more

Code Pattern (Identical Across Files):

func (c *Compiler) parseXConfig(outputMap map[string]any) *XConfig {
    // Check if the key exists
    if _, exists := outputMap["x-key"]; !exists {
        return nil
    }
    
    // Extract config map
    configMap := outputMap["x-key"].(map[string]any)
    
    // Parse fields using helpers
    field1 := ParseStringArrayFromConfig(configMap, "field1", log)
    field2 := extractStringFromMap(configMap, "field2", log)
    field3 := ParseBoolFromConfig(configMap, "field3", log)
    
    // Return config
    return &XConfig{
        Field1: field1,
        Field2: field2,
        Field3: field3,
    }
}

Analysis:

While these functions serve different safe outputs, the parsing logic is nearly identical. This pattern could benefit from:

  1. A generic config parser with type parameters (Go generics)
  2. Reflection-based config mapping
  3. Code generation from schema definitions

Recommendation: Consider introducing a generic config parser or code generation approach for safe output config parsing. This would:

  • Reduce code duplication by ~800-1000 lines
  • Ensure consistent parsing behavior across all configs
  • Simplify addition of new safe outputs

Estimated Impact: High - could reduce repetitive code by 20-30%


3. Scattered Validation Functions

Issue: Validation functions are spread across 20+ files with inconsistent organization.

Validation Function Distribution:

View Validation Function Locations

Files with validation functions (45+ total):

  • action_sha_checker.go: ValidateActionSHAsInLockFile
  • agent_validation.go: validateAgentFile, validateHTTPTransportSupport, validateMaxTurnsSupport, validateWebSearchSupport, validateWorkflowRunBranches
  • agentic_engine.go: GenerateSecretValidationStep, GenerateMultiSecretValidationStep
  • artifact_manager.go: ValidateDownload, ValidateAllDownloads
  • bundler_runtime_validation.go: validateNoRuntimeMixing, validateRuntimeModeRecursive
  • bundler_safety_validation.go: validateNoLocalRequires, validateNoModuleReferences, ValidateEmbeddedResourceRequires
  • bundler_script_validation.go: validateNoExecSync, validateNoGitHubScriptGlobals
  • compiler_filters_validation.go: ValidateEventFilters, validateFilterExclusivity
  • dangerous_permissions_validation.go: validateDangerousPermissions
  • dispatch_workflow_validation.go: validateDispatchWorkflow
  • docker_validation.go: validateDockerImage
  • engine_validation.go: validateEngine, validateSingleEngineSpecification
  • expression_validation.go: validateExpressionSafety, validateSingleExpression, ValidateExpressionSafetyPublic, validateRuntimeImportFiles
  • features_validation.go: validateFeatures, validateActionTag
  • firewall_validation.go: ValidateLogLevel
  • github_tool_to_toolset.go: ValidateGitHubToolsAgainstToolsets
  • npm_validation.go: (validation functions)
  • pip_validation.go: (validation functions)
  • permissions_validation.go: (validation functions)
  • runtime_validation.go: (validation functions)
  • sandbox_validation.go: (validation functions)
  • schema_validation.go: (validation functions)
  • secrets_validation.go: (validation functions)
  • step_order_validation.go: (validation functions)
  • strict_mode_validation.go: (validation functions)
  • template_injection_validation.go: (validation functions)
  • template_validation.go: (validation functions)
  • validation_helpers.go: ValidateRequired, ValidateMaxLength, ValidateMinLength, ValidateInList, ValidatePositiveInt, ValidateNonNegativeInt

Analysis:

While specialized validation files (e.g., docker_validation.go, firewall_validation.go) make sense, there are issues:

  1. Inconsistent naming: Some files use *_validation.go pattern, others embed validation in feature files
  2. Validation helper usage: The validation_helpers.go file was added to consolidate patterns but is underutilized
  3. Validation scattered in non-validation files: Files like agent_validation.go, artifact_manager.go mix validation with business logic

File Organization Issues:

File Primary Purpose Contains Validation? Issue
artifact_manager.go Artifact management Yes (2 validation methods) ✗ Validation mixed with business logic
agent_validation.go Agent validation Yes (5 validation functions) ✓ Correctly organized
bundler.go JavaScript bundling No ✓ Validation separated to bundler_*_validation.go

Recommendation:

  1. Adopt consistent validation file pattern: All validation should be in *_validation.go files
  2. Extract validation from business logic files: Move validation methods from files like artifact_manager.go to artifact_validation.go
  3. Increase usage of validation helpers: Refactor validation functions to use helpers from validation_helpers.go

Estimated Impact: Medium - improved consistency and discoverability


4. MCP Configuration Files - Good Organization Example

Positive Finding: The MCP configuration files demonstrate excellent organization:

MCP Files (16 files with mcp prefix):

  • mcp_config_builtin.go - Built-in MCP configurations
  • mcp_config_custom.go - Custom MCP server handling
  • mcp_config_playwright_renderer.go - Playwright-specific rendering
  • mcp_config_serena_renderer.go - Serena-specific rendering
  • mcp_config_types.go - Type definitions
  • mcp_config_utils.go - Utility functions
  • mcp_config_validation.go - Validation logic
  • mcp_detection.go - MCP server detection
  • mcp_environment.go - Environment configuration
  • mcp_gateway_config.go - Gateway configuration
  • mcp_gateway_constants.go - Gateway constants
  • mcp_github_config.go - GitHub MCP configuration
  • mcp_playwright_config.go - Playwright configuration
  • mcp_renderer.go - Main rendering logic
  • mcp_serena_config.go - Serena configuration
  • mcp_setup_generator.go - Setup generation

Why This Organization Works:

  1. Clear file naming: Each file has a specific, descriptive purpose
  2. Logical separation: Types, utils, validation, renderers are separate
  3. Consistent prefix: All MCP files use mcp_ prefix for discoverability
  4. Feature-based organization: Files organized by MCP server type (github, playwright, serena)

Recommendation: Use the MCP file organization as a model for other subsystems.


5. Compiler Files - Organizational Inconsistency

Issue: The compiler subsystem has 26 files with inconsistent organization patterns.

Compiler Files Analysis:

View Compiler File Organization

Main Files:

  • compiler.go - Main compiler logic
  • compiler_types.go - Type definitions
  • compiler_orchestrator.go - Orchestration logic

Job-Related Files:

  • compiler_activation_jobs.go - Activation job building
  • compiler_jobs.go - Job management
  • compiler_safe_output_jobs.go - Safe output job building
  • compiler_safe_outputs.go - Safe outputs processing
  • compiler_safe_outputs_config.go - Safe outputs config
  • compiler_safe_outputs_core.go - Core safe outputs
  • compiler_safe_outputs_discussions.go - Discussion safe outputs
  • compiler_safe_outputs_env.go - Environment safe outputs
  • compiler_safe_outputs_job.go - Job-level safe outputs
  • compiler_safe_outputs_shared.go - Shared safe outputs
  • compiler_safe_outputs_specialized.go - Specialized safe outputs
  • compiler_safe_outputs_steps.go - Step-level safe outputs

Orchestrator Files:

  • compiler_orchestrator.go - Main orchestration
  • compiler_orchestrator_engine.go - Engine setup
  • compiler_orchestrator_frontmatter.go - Frontmatter parsing
  • compiler_orchestrator_tools.go - Tools processing
  • compiler_orchestrator_workflow.go - Workflow processing

YAML Generation Files:

  • compiler_yaml.go - Main YAML generation
  • compiler_yaml_ai_execution.go - AI execution YAML
  • compiler_yaml_artifacts.go - Artifacts YAML
  • compiler_yaml_helpers.go - YAML helpers
  • compiler_yaml_main_job.go - Main job YAML

Other Files:

  • compiler_filters_validation.go - Filter validation
  • compiler_test_helpers.go - Test utilities

Analysis:

The compiler files show two competing organizational patterns:

Pattern 1: Flat organization with long filenames

  • Example: compiler_safe_outputs_discussions.go
  • Pro: All files in one directory
  • Con: Long filenames, harder to navigate

Pattern 2: Feature grouping

  • Example: Orchestrator files (compiler_orchestrator_*.go)
  • Pro: Related functionality grouped by prefix
  • Con: Inconsistently applied

Issues Identified:

  1. Safe outputs files are fragmented: 9 different compiler_safe_outputs_*.go files
  2. Unclear separation: Some files have overlapping concerns (e.g., compiler_safe_outputs.go vs compiler_safe_outputs_core.go)
  3. Missing hierarchy: Related files don't have clear parent-child relationships

Recommendation:

Consider one of two approaches:

Option A: Maintain flat structure with clearer naming

  • Rename files to follow consistent pattern: compiler_(subsystem)_(feature).go
  • Example: compiler_safeoutputs_discussions.go, compiler_safeoutputs_env.go
  • Consolidate overlapping files (e.g., merge compiler_safe_outputs.go and compiler_safe_outputs_core.go)

Option B: Introduce subdirectories for major subsystems

  • Create compiler/safeoutputs/ subdirectory
  • Move safe outputs files into subdirectory
  • Keep main compiler files at top level
  • Similar to how many Go projects organize large packages

Note: Option B would be a larger refactoring but could significantly improve navigation in this 251-file package.

Estimated Impact: Medium - primarily affects developer navigation and onboarding


6. Create/Update Entity Pattern - Good Consistency

Positive Finding: The create and update entity files follow a consistent pattern:

Create Files (8 files):

  • create_agent_session.go
  • create_code_scanning_alert.go
  • create_discussion.go
  • create_issue.go
  • create_pr_review_comment.go
  • create_project.go
  • create_project_status_update.go
  • create_pull_request.go

Update Files (7 files):

  • update_discussion.go
  • update_entity_helpers.go
  • update_issue.go
  • update_project.go
  • update_project_job.go
  • update_pull_request.go
  • update_release.go

Pattern Strengths:

  • Consistent naming: create_(entity).go and update_(entity).go
  • One entity per file: Each file handles a single entity type
  • Predictable structure: All follow parse config → build job → generate steps pattern

Recommendation: Maintain this pattern and use it as a reference for other feature areas.


Function Clustering Analysis

Common Function Prefixes

Analysis of function naming revealed clear semantic clusters:

Prefix Count Purpose Example Files
Get* 66 Retrieval operations GetActionPin, GetCopilotAgentPlaywrightTools
New* 47 Constructor functions NewActionCache, NewEngineRegistry
Build* 34 Step/job building BuildPreActivationJob, BuildActivationJob
Generate* 20 Code generation GenerateSecretValidationStep, GenerateWriteScriptsStep
Parse* 17 Config parsing ParseWorkflowFile, ParseCommandEvents
Validate* 14 Validation logic ValidateRequired, ValidateEventFilters
Extract* 14 Data extraction ExtractActionsFromLockFile, ExtractAgentIdentifier

Analysis:

The function naming shows good semantic organization. Functions are consistently named by their primary action.

Potential Improvements:

  1. Parse functions: While well-named, 40+ parse functions follow identical patterns (see Issue Add workflow: githubnext/agentics/weekly-research #2)
  2. Build functions: Could benefit from interface-based design to reduce boilerplate
  3. Validate functions: Should use validation_helpers.go more consistently

Detailed Recommendations

Priority 1: High Impact, Low Risk

1.1 Consolidate Entity Operation Helpers

Action: Merge close_entity_helpers.go and update_entity_helpers.go into entity_operation_helpers.go

Rationale: Both files contain generic entity parsing functions that share similar patterns.

Files to modify:

  • pkg/workflow/close_entity_helpers.go → merge into new file
  • pkg/workflow/update_entity_helpers.go → merge into new file
  • Create pkg/workflow/entity_operation_helpers.go

Estimated Effort: 2-3 hours
Risk: Low - functions are well-tested and have clear interfaces


1.2 Consolidate Safe Outputs Config Helpers

Action: Merge safe_outputs_config_generation_helpers.go and safe_outputs_config_helpers.go

Rationale: Both handle safe outputs configuration with overlapping concerns.

Files to modify:

  • pkg/workflow/safe_outputs_config_generation_helpers.go → merge
  • pkg/workflow/safe_outputs_config_helpers.go → merge
  • Result: Single pkg/workflow/safe_outputs_config_helpers.go

Estimated Effort: 1-2 hours
Risk: Low - clear separation of concerns


1.3 Relocate Single-Function Helper Files

Action: Move functions from single-function helper files to appropriate domain files.

Relocations:

  • git_helpers.go::GetCurrentGitTag → Move to git operations file or git.go in pkg/cli
  • map_helpers.go functions → Move to more specific files or pkg/sliceutil if truly generic
  • prompt_step_helper.go::generateStaticPromptStep → Move to prompt_step.go

Estimated Effort: 1 hour
Risk: Very low - simple function moves


Priority 2: Medium Impact, Medium Effort

2.1 Introduce Generic Config Parser

Action: Create a generic config parser to reduce the 40+ repetitive parse*Config functions.

Approach Options:

Option A: Reflection-based parser

func ParseConfig[T any](outputMap map[string]any, key string) (*T, error) {
    if _, exists := outputMap[key]; !exists {
        return nil, nil
    }
    // Use reflection to map fields
}
``````

**Option B: Code generation**
- Define config schemas in YAML or struct tags
- Generate parsing functions at build time
- Similar to protobuf or OpenAPI code generation

**Recommendation**: Start with Option A (reflection) for immediate benefits, consider Option B for long-term maintenance.

**Estimated Effort**: 8-12 hours
**Risk**: Medium - requires careful testing to ensure backward compatibility

---

#### 2.2 Standardize Validation File Organization

**Action**: Extract validation logic from business logic files into dedicated `*_validation.go` files.

**Files to refactor:**
- `artifact_manager.go` → Create `artifact_validation.go` for validation methods
- Review all 45+ validation functions for consistent placement

**Estimated Effort**: 6-8 hours
**Risk**: Medium - requires careful code moves and test updates

---

### Priority 3: Long-term Improvements

#### 3.1 Consider Package Subdivision

**Action**: Evaluate subdividing the `pkg/workflow` package (251 files) into subpackages.

**Potential Structure:**
``````
pkg/workflow/
  compiler/          # Compiler-related files
    orchestrator/    # Orchestrator files
    safeoutputs/     # Safe outputs files
    yaml/            # YAML generation
  engines/           # Engine implementations
  validation/        # Validation functions
  entities/          # Entity operations
  mcp/              # MCP configuration

Estimated Effort: 40-60 hours (large refactoring)
Risk: High - requires updating all imports across codebase

Recommendation: Defer until codebase grows beyond 300 files or clear pain points emerge.


Implementation Checklist

Phase 1: Quick Wins (Week 1)

  • Consolidate entity operation helpers
  • Consolidate safe outputs config helpers
  • Relocate single-function helper files
  • Update tests for moved functions
  • Run full test suite to verify no regressions

Phase 2: Medium Refactoring (Weeks 2-3)

  • Design generic config parser approach
  • Implement generic config parser
  • Migrate 5-10 config parsers as pilot
  • Evaluate pilot results
  • Standardize validation file organization
  • Extract validation from artifact_manager.go
  • Update validation patterns to use helpers

Phase 3: Long-term Considerations (Future)

  • Review package size and organization
  • Evaluate need for subpackages
  • Consider code generation for repetitive patterns
  • Document organization guidelines

Analysis Methodology

Tools and Techniques Used

  1. File pattern analysis: Examined 490 Go files across all packages
  2. Function name clustering: Analyzed 379 exported functions for semantic patterns
  3. Grep-based code search: Identified validation, parsing, and helper function patterns
  4. Manual code review: Sampled files to verify patterns and assess duplication
  5. Comparative analysis: Compared organization across different subsystems (MCP, compiler, entities)

Limitations

  • Scope: Focused primarily on pkg/workflow due to its size (251 files)
  • Depth: Did not perform line-by-line semantic similarity analysis (would require AST parsing)
  • Other packages: pkg/cli (172 files) and pkg/parser (32 files) were not deeply analyzed

Future Analysis Recommendations

  1. AST-based duplicate detection: Use Go's ast package to find semantic duplicates beyond naming patterns
  2. Cyclomatic complexity analysis: Identify overly complex functions for refactoring
  3. Dependency graph analysis: Visualize dependencies between files to identify tight coupling
  4. Import analysis: Find circular dependencies or inappropriate coupling

Conclusion

The codebase demonstrates many strengths:

  • ✅ Consistent entity operation patterns (create/update files)
  • ✅ Excellent MCP configuration organization
  • ✅ Strong function naming conventions
  • ✅ Good validation helper foundation

High-impact opportunities identified:

  1. Helper file consolidation (5-8 hours effort, low risk)
  2. Generic config parser (8-12 hours effort, high impact)
  3. Validation organization (6-8 hours effort, medium impact)

Recommended immediate actions:

  1. Start with Priority 1 items (helper consolidation) for quick wins
  2. Pilot generic config parser with 5-10 functions
  3. Document organization guidelines based on MCP example

The refactoring opportunities identified are pragmatic and focused on reducing duplication while improving code discoverability and maintainability.


Analysis Date: 2026-01-31
Files Analyzed: 490 Go files (251 in pkg/workflow)
Functions Cataloged: 379 exported functions
Detection Method: Pattern analysis, grep-based code search, manual review

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions