Skip to content

Comments

refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation#17768

Merged
pelikhan merged 3 commits intomainfrom
copilot/refactor-semantic-function-clustering-6dcb86e4-d38e-4ade-8b9b-9856d4d1cd55
Feb 22, 2026
Merged

refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation#17768
pelikhan merged 3 commits intomainfrom
copilot/refactor-semantic-function-clustering-6dcb86e4-d38e-4ade-8b9b-9856d4d1cd55

Conversation

Copy link
Contributor

Copilot AI commented Feb 22, 2026

Addresses the three Priority 1 items from the semantic function clustering analysis: extract shared mount validation logic, and relocate two copilot execution helpers to their call sites.

Changes

  • validateMountStringFormat()validation_helpers.go
    New shared primitive that parses source:destination:mode format (3-parts check + ro/rw mode check). Both consumers delegate core validation here and wrap errors in their own context-appropriate types:

    • validateMountsSyntax() (sandbox_validation.go) — wraps as NewValidationError; also drops now-unused strings import
    • validateMCPMountsSyntax() (mcp_config_validation.go) — wraps as fmt.Errorf; uses source == "" && dest == "" && mode == "" to correctly distinguish format vs. mode errors (edge case: mount with empty mode string like /foo:/bar: would otherwise misclassify)
  • extractAddDirPaths() moved to copilot_engine_execution.go
    Was in copilot_engine.go but only called from copilot_engine_execution.go.

  • generateCopilotSessionFileCopyStep() moved to copilot_engine_execution.go
    Was in copilot_logs.go; grouped with other execution step generators.

// Before: duplicated in both files
parts := strings.Split(mount, ":")
if len(parts) != 3 { ... }
mode := parts[2]
if mode != "ro" && mode != "rw" { ... }

// After: shared helper, callers own the error wrapping
source, dest, mode, err := validateMountStringFormat(mount)
if err != nil {
    if source == "" && dest == "" && mode == "" {
        return NewValidationError(...)  // format error
    }
    return NewValidationError(...)      // mode error
}

Tests

  • TestValidateMountStringFormat (6 cases, validation_helpers_test.go) — covers valid mounts, format errors where all return values are empty (wrong number of parts), and mode errors where source/dest are returned alongside the invalid mode value (including the empty-mode edge case /foo:/bar:).
  • TestValidateMCPMountsSyntax (12 cases, new mcp_config_validation_test.go) — covers []string and []any inputs, wrong type, format/mode errors, error messages including tool name and mount index, and silent skipping of non-string []any items.
  • TestGenerateCopilotSessionFileCopyStep (copilot_engine_test.go) — verifies the relocated function produces a step with the correct name, always() condition, session-state source and gh-aw logs destination directories, and continue-on-error: true.
Original prompt

This section details on the original issue you should resolve

<issue_title>[refactor] Semantic function clustering: duplicates and outliers in pkg/workflow</issue_title>
<issue_description>Automated semantic analysis of all non-test Go files in pkg/workflow/ (240+ files) and utility packages (pkg/stringutil, pkg/sliceutil, etc.) identified several concrete refactoring opportunities through function clustering and duplicate detection.

Overview

Metric Value
Go files analyzed 240+ (pkg/workflow) + 7 utility packages
Duplicate patterns found 4 concrete cases
Outlier functions found 2
Estimated code reduction ~700 lines
Detection method Serena semantic analysis + naming pattern clustering

Critical Issues

Issue 1: Duplicate Mount Syntax Validation

Two functions implement nearly identical mount string validation (source:dest:mode format) in separate files:

  • pkg/workflow/sandbox_validation.govalidateMountsSyntax()
  • pkg/workflow/mcp_config_validation.govalidateMCPMountsSyntax()

Both split on :, check for exactly 3 parts, and validate that mode is "ro" or "rw". The only differences are error types (NewValidationError vs fmt.Errorf) and that the sandbox version also validates non-empty source/dest fields.

Recommendation: Extract a shared validateMountFormat(mounts []string, toolName string) error function into validation_helpers.go or a new mounts_validation.go, then call it from both files with appropriate error wrapping.

Issue 2: Near-Identical missing_data.go and missing_tool.go

pkg/workflow/missing_data.go and pkg/workflow/missing_tool.go are structurally identical (~164 lines each, ~250 duplicated lines):

View duplicate struct definitions
// missing_data.go
type MissingDataConfig struct {
    BaseSafeOutputConfig   `yaml:",inline"`
    CreateIssue bool       `yaml:"create-issue,omitempty"`
    TitlePrefix string     `yaml:"title-prefix,omitempty"`
    Labels      []string   `yaml:"labels,omitempty"`
}

// missing_tool.go  (IDENTICAL structure, only type name differs)
type MissingToolConfig struct {
    BaseSafeOutputConfig   `yaml:",inline"`
    CreateIssue bool       `yaml:"create-issue,omitempty"`
    TitlePrefix string     `yaml:"title-prefix,omitempty"`
    Labels      []string   `yaml:"labels,omitempty"`
}

Both parseMissingDataConfig() / parseMissingToolConfig() and buildCreateOutputMissingDataJob() / buildCreateOutputMissingToolJob() follow exactly the same pattern with only the output type name and environment variable prefix differing (e.g., GH_AW_MISSING_DATA_MAX vs GH_AW_MISSING_TOOL_MAX).

Recommendation: Consider a shared BaseMissingOutputConfig type and a generic buildCreateOutputMissingJob(outputType string, config BaseMissingOutputConfig) builder, reducing ~250 lines to ~80 lines of shared code. noop.go follows the same outer parse pattern and could share the same boilerplate helper.

Issue 3: Safe Output Handler Parse Function Boilerplate (11 files)

The following files all implement the same three-part pattern — a config struct embedding BaseSafeOutputConfig + SafeOutputTargetConfig, a parseXxxConfig() method on *Compiler, and a build function:

add_comment.go, add_labels.go, add_reviewer.go, assign_milestone.go, assign_to_agent.go, assign_to_user.go, hide_comment.go, link_sub_issue.go, remove_labels.go, reply_to_pr_review_comment.go, resolve_pr_review_thread.go

View duplicated parse function pattern
// add_labels.go (lines 20-40)
func (c *Compiler) parseAddLabelsConfig(outputMap map[string]any) *AddLabelsConfig {
    if _, exists := outputMap["add-labels"]; !exists {
        return nil
    }
    addLabelsLog.Print("Parsing add-labels configuration")
    var config AddLabelsConfig
    if err := unmarshalConfig(outputMap, "add-labels", &config, addLabelsLog); err != nil {
        addLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &AddLabelsConfig{}
    }
    return &config
}

// remove_labels.go (lines 18-38) — near-identical
func (c *Compiler) parseRemoveLabelsConfig(outputMap map[string]any) *RemoveLabelsConfig {
    if _, exists := outputMap["remove-labels"]; !exists {
        return nil
    }
    removeLabelsLog.Print("Parsing remove-labels configuration")
    var config RemoveLabelsConfig
    if err := unmarshalConfig(outputMap, "remove-labels", &config, removeLabelsLog); err != nil {
        removeLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &RemoveLabelsConfig{}
    }
    return &config
}

The AssignToUserConfig and UnassignFromUserConfig structs are also nearly identical:

// Both have: BaseSafeOutputConfig, SafeOutputTargetConfig, Allowed []string, Blocked []string
// UnassignFromUser has one extra f...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes github/gh-aw#17762

<!-- START COPILOT CODING AGENT TIPS -->
---Let Copilot coding agent [set things up for you](https://github.com/github/gh-aw/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

…nctions

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor duplicate and outlier functions in pkg/workflow refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation Feb 22, 2026
@pelikhan
Copy link
Contributor

@copilot add tests

…x, generateCopilotSessionFileCopyStep

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 22, 2026

@copilot add tests

Added in commit 5ae9713:

  • TestValidateMountStringFormat (6 cases) in validation_helpers_test.go — covers valid mounts, format errors (too few/too many parts, all return values empty), and mode errors (invalid value, empty mode with source/dest still returned)
  • TestValidateMCPMountsSyntax (12 cases) in new mcp_config_validation_test.go — covers []string and []any inputs, invalid type, format errors, invalid/uppercase modes, error messages including tool name and mount index, and non-string []any items being skipped
  • TestGenerateCopilotSessionFileCopyStep in copilot_engine_test.go — verifies the step name, always() condition, source/destination directories, and continue-on-error

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw GOMOD GOMODCACHE ache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
  • https://api.github.com/repos/actions/ai-inference/git/ref/tags/v1
    • Triggering command: /usr/bin/gh gh api /repos/actions/ai-inference/git/ref/tags/v1 --jq .object.sha -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v3
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v3 --jq .object.sha -json GO111MODULE x_amd64/vet GOINSECURE GOMOD GOMODCACHE x_amd64/vet env -json GO111MODULE x_amd64/vet GOINSECURE GOMOD GOMODCACHE x_amd64/vet (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha prettier --check 64/pkg/tool/linux_amd64/vet --ignore-path .prettierignore 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha &#34;prettier&#34; --check &#39;scripts/**/*.js&#39; --ignore-pa- go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha -x c 453269/b249/vet.cfg - GOWORK 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -uns�� -unreachable=false /tmp/go-build524453269/b072/vet.cfg /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go /opt/hostedtoolcache/go/1.25.0/x/tmp/gh-aw/aw-master.patch (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v5
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq .object.sha -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq .object.sha -json GO111MODULE ache/go/1.25.0/x64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE ache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq .object.sha -unreachable=false /tmp/go-build524453269/b062/vet.cfg 453269/b332/vet.cfg GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -uns�� -unreachable=false /tmp/go-build524453269/b211/vet.cfg /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -json GO111MODULE ache/go/1.25.0/x--show-toplevel /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
  • https://api.github.com/repos/actions/github-script/git/ref/tags/v8
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha &#34;prettier&#34; --check &#39;scripts/**/*.js&#39; --ignore-paGOSUMDB go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet on; \ echo &#34;���git GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha &#34;prettier&#34; --check &#39;scripts/**/*.js&#39; --ignore-paGOSUMDB /usr/lib/systemdGO111MODULE 64/pkg/tool/linux_amd64/vet 70 --log-level 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha GOPATH GOPROXY /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -uns�� -unreachable=false /tmp/go-build524453269/b070/vet.cfg /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
  • https://api.github.com/repos/actions/setup-go/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-go/git/ref/tags/v4 --jq .object.sha &#34;prettier&#34; --check &#39;scripts/**/*.js&#39; --ignore-paGOSUMDB go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
  • https://api.github.com/repos/actions/setup-node/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v4 --jq .object.sha &#34;prettier&#34; --check &#39;scripts/**/*-s go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
  • https://api.github.com/repos/github/gh-aw/git/ref/tags/v1.0.0
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v1.0.0 --jq .object.sha .js&#39; --ignore-paGOSUMDB GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)
  • https://api.github.com/repos/nonexistent/action/git/ref/tags/v999.999.999
    • Triggering command: /usr/bin/gh gh api /repos/nonexistent/action/git/ref/tags/v999.999.999 --jq .object.sha -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)

If you need me to access, download, or install something from one of these locations, you can either:

@pelikhan pelikhan marked this pull request as ready for review February 22, 2026 20:01
Copilot AI review requested due to automatic review settings February 22, 2026 20:01
@pelikhan pelikhan merged commit 3c073f2 into main Feb 22, 2026
117 checks passed
@pelikhan pelikhan deleted the copilot/refactor-semantic-function-clustering-6dcb86e4-d38e-4ade-8b9b-9856d4d1cd55 branch February 22, 2026 20:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Priority 1 refactoring items from a semantic function clustering analysis, focusing on extracting shared mount validation logic and relocating copilot execution helper functions to improve code organization.

Changes:

  • Extracted duplicated mount string validation logic into a shared validateMountStringFormat() helper function
  • Relocated two copilot execution helper functions (extractAddDirPaths() and generateCopilotSessionFileCopyStep()) to copilot_engine_execution.go where they semantically belong
  • Added comprehensive test coverage for the new validation helper and MCP mount validation

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/workflow/validation_helpers.go Added shared validateMountStringFormat() helper to parse and validate "source:dest:mode" mount strings
pkg/workflow/validation_helpers_test.go Added comprehensive tests for mount format validation including edge cases
pkg/workflow/sandbox_validation.go Refactored to use shared validation helper; removed duplicate validation logic and unused strings import
pkg/workflow/mcp_config_validation.go Refactored to use shared validation helper; properly distinguishes format vs mode errors
pkg/workflow/mcp_config_validation_test.go New test file with 12 test cases covering MCP mount validation scenarios
pkg/workflow/copilot_engine.go Removed extractAddDirPaths() function (moved to execution file)
pkg/workflow/copilot_engine_execution.go Added relocated extractAddDirPaths() and generateCopilotSessionFileCopyStep() functions; removed orphaned comment
pkg/workflow/copilot_logs.go Removed generateCopilotSessionFileCopyStep() function (moved to execution file)
pkg/workflow/copilot_engine_test.go Added test for generateCopilotSessionFileCopyStep() to verify step generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// validateMountStringFormat parses a mount string and validates its basic format.
// Expected format: "source:destination:mode" where mode is "ro" or "rw".
// Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure.
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment is inaccurate. It states "Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure", but this doesn't describe the actual behavior correctly. The function returns empty strings for all values only when there's a format error (wrong number of parts), but returns the actual source, dest, and mode values when there's a mode validation error. The comment should be updated to clarify this distinction, for example: "Returns (source, dest, mode, nil) on success. On format errors (wrong number of parts), returns ("", "", "", error). On mode validation errors, returns (source, dest, invalid_mode, error) so callers can provide better error messages."

Suggested change
// Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure.
// Returns (source, dest, mode, nil) on success.
// On format errors (wrong number of parts), returns ("", "", "", error).
// On mode validation errors, returns (source, dest, invalid_mode, error) so callers can provide better error messages.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants