refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation by Copilot · Pull Request #17768 · github/gh-aw

Copilot · 2026-02-22T18:31:59Z

Addresses the three Priority 1 items from the semantic function clustering analysis: extract shared mount validation logic, and relocate two copilot execution helpers to their call sites.

Changes

validateMountStringFormat() → validation_helpers.go
New shared primitive that parses source:destination:mode format (3-parts check + ro/rw mode check). Both consumers delegate core validation here and wrap errors in their own context-appropriate types:
- validateMountsSyntax() (sandbox_validation.go) — wraps as NewValidationError; also drops now-unused strings import
- validateMCPMountsSyntax() (mcp_config_validation.go) — wraps as fmt.Errorf; uses source == "" && dest == "" && mode == "" to correctly distinguish format vs. mode errors (edge case: mount with empty mode string like /foo:/bar: would otherwise misclassify)
extractAddDirPaths() moved to copilot_engine_execution.go
Was in copilot_engine.go but only called from copilot_engine_execution.go.
generateCopilotSessionFileCopyStep() moved to copilot_engine_execution.go
Was in copilot_logs.go; grouped with other execution step generators.

// Before: duplicated in both files
parts := strings.Split(mount, ":")
if len(parts) != 3 { ... }
mode := parts[2]
if mode != "ro" && mode != "rw" { ... }

// After: shared helper, callers own the error wrapping
source, dest, mode, err := validateMountStringFormat(mount)
if err != nil {
    if source == "" && dest == "" && mode == "" {
        return NewValidationError(...)  // format error
    }
    return NewValidationError(...)      // mode error
}

Tests

TestValidateMountStringFormat (6 cases, validation_helpers_test.go) — covers valid mounts, format errors where all return values are empty (wrong number of parts), and mode errors where source/dest are returned alongside the invalid mode value (including the empty-mode edge case /foo:/bar:).
TestValidateMCPMountsSyntax (12 cases, new mcp_config_validation_test.go) — covers []string and []any inputs, wrong type, format/mode errors, error messages including tool name and mount index, and silent skipping of non-string []any items.
TestGenerateCopilotSessionFileCopyStep (copilot_engine_test.go) — verifies the relocated function produces a step with the correct name, always() condition, session-state source and gh-aw logs destination directories, and continue-on-error: true.

Original prompt

This section details on the original issue you should resolve

<issue_title>[refactor] Semantic function clustering: duplicates and outliers in pkg/workflow</issue_title>
<issue_description>Automated semantic analysis of all non-test Go files in pkg/workflow/ (240+ files) and utility packages (pkg/stringutil, pkg/sliceutil, etc.) identified several concrete refactoring opportunities through function clustering and duplicate detection.

Overview

Metric	Value
Go files analyzed	240+ (pkg/workflow) + 7 utility packages
Duplicate patterns found	4 concrete cases
Outlier functions found	2
Estimated code reduction	~700 lines
Detection method	Serena semantic analysis + naming pattern clustering

Critical Issues

Issue 1: Duplicate Mount Syntax Validation

Two functions implement nearly identical mount string validation (source:dest:mode format) in separate files:

pkg/workflow/sandbox_validation.go → validateMountsSyntax()
pkg/workflow/mcp_config_validation.go → validateMCPMountsSyntax()

Both split on :, check for exactly 3 parts, and validate that mode is "ro" or "rw". The only differences are error types (NewValidationError vs fmt.Errorf) and that the sandbox version also validates non-empty source/dest fields.

Recommendation: Extract a shared validateMountFormat(mounts []string, toolName string) error function into validation_helpers.go or a new mounts_validation.go, then call it from both files with appropriate error wrapping.

Issue 2: Near-Identical missing_data.go and missing_tool.go

pkg/workflow/missing_data.go and pkg/workflow/missing_tool.go are structurally identical (~164 lines each, ~250 duplicated lines):

View duplicate struct definitions

// missing_data.go
type MissingDataConfig struct {
    BaseSafeOutputConfig   `yaml:",inline"`
    CreateIssue bool       `yaml:"create-issue,omitempty"`
    TitlePrefix string     `yaml:"title-prefix,omitempty"`
    Labels      []string   `yaml:"labels,omitempty"`
}

// missing_tool.go  (IDENTICAL structure, only type name differs)
type MissingToolConfig struct {
    BaseSafeOutputConfig   `yaml:",inline"`
    CreateIssue bool       `yaml:"create-issue,omitempty"`
    TitlePrefix string     `yaml:"title-prefix,omitempty"`
    Labels      []string   `yaml:"labels,omitempty"`
}

Both parseMissingDataConfig() / parseMissingToolConfig() and buildCreateOutputMissingDataJob() / buildCreateOutputMissingToolJob() follow exactly the same pattern with only the output type name and environment variable prefix differing (e.g., GH_AW_MISSING_DATA_MAX vs GH_AW_MISSING_TOOL_MAX).

Recommendation: Consider a shared BaseMissingOutputConfig type and a generic buildCreateOutputMissingJob(outputType string, config BaseMissingOutputConfig) builder, reducing ~250 lines to ~80 lines of shared code. noop.go follows the same outer parse pattern and could share the same boilerplate helper.

Issue 3: Safe Output Handler Parse Function Boilerplate (11 files)

The following files all implement the same three-part pattern — a config struct embedding BaseSafeOutputConfig + SafeOutputTargetConfig, a parseXxxConfig() method on *Compiler, and a build function:

add_comment.go, add_labels.go, add_reviewer.go, assign_milestone.go, assign_to_agent.go, assign_to_user.go, hide_comment.go, link_sub_issue.go, remove_labels.go, reply_to_pr_review_comment.go, resolve_pr_review_thread.go

View duplicated parse function pattern

// add_labels.go (lines 20-40)
func (c *Compiler) parseAddLabelsConfig(outputMap map[string]any) *AddLabelsConfig {
    if _, exists := outputMap["add-labels"]; !exists {
        return nil
    }
    addLabelsLog.Print("Parsing add-labels configuration")
    var config AddLabelsConfig
    if err := unmarshalConfig(outputMap, "add-labels", &config, addLabelsLog); err != nil {
        addLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &AddLabelsConfig{}
    }
    return &config
}

// remove_labels.go (lines 18-38) — near-identical
func (c *Compiler) parseRemoveLabelsConfig(outputMap map[string]any) *RemoveLabelsConfig {
    if _, exists := outputMap["remove-labels"]; !exists {
        return nil
    }
    removeLabelsLog.Print("Parsing remove-labels configuration")
    var config RemoveLabelsConfig
    if err := unmarshalConfig(outputMap, "remove-labels", &config, removeLabelsLog); err != nil {
        removeLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &RemoveLabelsConfig{}
    }
    return &config
}

The AssignToUserConfig and UnassignFromUserConfig structs are also nearly identical:

// Both have: BaseSafeOutputConfig, SafeOutputTargetConfig, Allowed []string, Blocked []string
// UnassignFromUser has one extra f...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes github/gh-aw#17762

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/github/gh-aw/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

…nctions Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-02-22T18:53:41Z

@copilot add tests

…x, generateCopilotSessionFileCopyStep Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-02-22T19:02:11Z

@copilot add tests

Added in commit 5ae9713:

TestValidateMountStringFormat (6 cases) in validation_helpers_test.go — covers valid mounts, format errors (too few/too many parts, all return values empty), and mode errors (invalid value, empty mode with source/dest still returned)
TestValidateMCPMountsSyntax (12 cases) in new mcp_config_validation_test.go — covers []string and []any inputs, invalid type, format errors, invalid/uppercase modes, error messages including tool name and mount index, and non-string []any items being skipped
TestGenerateCopilotSessionFileCopyStep in copilot_engine_test.go — verifies the step name, always() condition, source/destination directories, and continue-on-error

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/graphql
- Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw GOMOD GOMODCACHE ache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
https://api.github.com/repos/actions/ai-inference/git/ref/tags/v1
- Triggering command: /usr/bin/gh gh api /repos/actions/ai-inference/git/ref/tags/v1 --jq .object.sha -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v3
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v3 --jq .object.sha -json GO111MODULE x_amd64/vet GOINSECURE GOMOD GOMODCACHE x_amd64/vet env -json GO111MODULE x_amd64/vet GOINSECURE GOMOD GOMODCACHE x_amd64/vet (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha prettier --check 64/pkg/tool/linux_amd64/vet --ignore-path .prettierignore 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha "prettier" --check 'scripts/**/*.js' --ignore-pa- go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha -x c 453269/b249/vet.cfg - GOWORK 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -uns�� -unreachable=false /tmp/go-build524453269/b072/vet.cfg /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go /opt/hostedtoolcache/go/1.25.0/x/tmp/gh-aw/aw-master.patch (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v5
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq .object.sha -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq .object.sha -json GO111MODULE ache/go/1.25.0/x64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE ache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq .object.sha -unreachable=false /tmp/go-build524453269/b062/vet.cfg 453269/b332/vet.cfg GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -uns�� -unreachable=false /tmp/go-build524453269/b211/vet.cfg /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -json GO111MODULE ache/go/1.25.0/x--show-toplevel /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
https://api.github.com/repos/actions/github-script/git/ref/tags/v8
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha "prettier" --check 'scripts/**/*.js' --ignore-paGOSUMDB go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet on; \ echo "��git GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha "prettier" --check 'scripts/**/*.js' --ignore-paGOSUMDB /usr/lib/systemdGO111MODULE 64/pkg/tool/linux_amd64/vet 70 --log-level 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha GOPATH GOPROXY /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -uns�� -unreachable=false /tmp/go-build524453269/b070/vet.cfg /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go /opt/hostedtoolcache/go/1.25.0/x64/pkg/tool/linux_amd64/vet (http block)
https://api.github.com/repos/actions/setup-go/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-go/git/ref/tags/v4 --jq .object.sha "prettier" --check 'scripts/**/*.js' --ignore-paGOSUMDB go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
https://api.github.com/repos/actions/setup-node/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v4 --jq .object.sha "prettier" --check 'scripts/**/*-s go 64/pkg/tool/linux_amd64/vet -json GO111MODULE 64/bin/go 64/pkg/tool/linux_amd64/vet env -json cfg 64/pkg/tool/linux_amd64/vet GOINSECURE GOMOD GOMODCACHE 64/pkg/tool/linux_amd64/vet (http block)
https://api.github.com/repos/github/gh-aw/git/ref/tags/v1.0.0
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v1.0.0 --jq .object.sha .js' --ignore-paGOSUMDB GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)
https://api.github.com/repos/nonexistent/action/git/ref/tags/v999.999.999
- Triggering command: /usr/bin/gh gh api /repos/nonexistent/action/git/ref/tags/v999.999.999 --jq .object.sha -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Copilot

Pull request overview

This PR implements Priority 1 refactoring items from a semantic function clustering analysis, focusing on extracting shared mount validation logic and relocating copilot execution helper functions to improve code organization.

Changes:

Extracted duplicated mount string validation logic into a shared validateMountStringFormat() helper function
Relocated two copilot execution helper functions (extractAddDirPaths() and generateCopilotSessionFileCopyStep()) to copilot_engine_execution.go where they semantically belong
Added comprehensive test coverage for the new validation helper and MCP mount validation

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pkg/workflow/validation_helpers.go	Added shared `validateMountStringFormat()` helper to parse and validate "source:dest:mode" mount strings
pkg/workflow/validation_helpers_test.go	Added comprehensive tests for mount format validation including edge cases
pkg/workflow/sandbox_validation.go	Refactored to use shared validation helper; removed duplicate validation logic and unused strings import
pkg/workflow/mcp_config_validation.go	Refactored to use shared validation helper; properly distinguishes format vs mode errors
pkg/workflow/mcp_config_validation_test.go	New test file with 12 test cases covering MCP mount validation scenarios
pkg/workflow/copilot_engine.go	Removed `extractAddDirPaths()` function (moved to execution file)
pkg/workflow/copilot_engine_execution.go	Added relocated `extractAddDirPaths()` and `generateCopilotSessionFileCopyStep()` functions; removed orphaned comment
pkg/workflow/copilot_logs.go	Removed `generateCopilotSessionFileCopyStep()` function (moved to execution file)
pkg/workflow/copilot_engine_test.go	Added test for `generateCopilotSessionFileCopyStep()` to verify step generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-22T20:05:08Z

pkg/workflow/validation_helpers.go

+
+// validateMountStringFormat parses a mount string and validates its basic format.
+// Expected format: "source:destination:mode" where mode is "ro" or "rw".
+// Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure.


The documentation comment is inaccurate. It states "Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure", but this doesn't describe the actual behavior correctly. The function returns empty strings for all values only when there's a format error (wrong number of parts), but returns the actual source, dest, and mode values when there's a mode validation error. The comment should be updated to clarify this distinction, for example: "Returns (source, dest, mode, nil) on success. On format errors (wrong number of parts), returns ("", "", "", error). On mode validation errors, returns (source, dest, invalid_mode, error) so callers can provide better error messages."

Suggested change

// Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure.

// Returns (source, dest, mode, nil) on success.

// On format errors (wrong number of parts), returns ("", "", "", error).

// On mode validation errors, returns (source, dest, invalid_mode, error) so callers can provide better error messages.

Initial plan

ff993f1

Copilot AI assigned Copilot and pelikhan Feb 22, 2026

Copilot started work on behalf of pelikhan February 22, 2026 18:32 View session

refactor: extract shared mount validation, relocate copilot engine fu…

ac55a0f

…nctions Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Refactor duplicate and outlier functions in pkg/workflow~~ refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation Feb 22, 2026

Copilot finished work on behalf of pelikhan February 22, 2026 18:52

Copilot started work on behalf of pelikhan February 22, 2026 18:53 View session

test: add tests for validateMountStringFormat, validateMCPMountsSynta…

5ae9713

…x, generateCopilotSessionFileCopyStep Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan February 22, 2026 19:03

pelikhan marked this pull request as ready for review February 22, 2026 20:01

Copilot AI review requested due to automatic review settings February 22, 2026 20:01

pelikhan merged commit 3c073f2 into main Feb 22, 2026
117 checks passed

pelikhan deleted the copilot/refactor-semantic-function-clustering-6dcb86e4-d38e-4ade-8b9b-9856d4d1cd55 branch February 22, 2026 20:01

Copilot started reviewing on behalf of pelikhan February 22, 2026 20:01 View session

Copilot AI reviewed Feb 22, 2026

View reviewed changes

This was referenced Feb 22, 2026

Smoke Test: Copilot - 22284305271 #17777

Closed

Smoke Test: Claude - 22284305292 #17778

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation#17768

refactor(workflow): Priority 1 semantic clustering — shared mount validation, copilot function relocation#17768
pelikhan merged 3 commits intomainfrom
copilot/refactor-semantic-function-clustering-6dcb86e4-d38e-4ade-8b9b-9856d4d1cd55

Copilot AI commented Feb 22, 2026 •

edited

Loading

Uh oh!

pelikhan commented Feb 22, 2026

Uh oh!

Copilot AI commented Feb 22, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-// Returns (source, dest, mode, nil) on success, or ("", "", "", error) on failure.
+// Returns (source, dest, mode, nil) on success.
+// On format errors (wrong number of parts), returns ("", "", "", error).
+// On mode validation errors, returns (source, dest, invalid_mode, error) so callers can provide better error messages.

Comments

Conversation

Copilot AI commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Tests

Overview

Critical Issues

Issue 1: Duplicate Mount Syntax Validation

Issue 2: Near-Identical missing_data.go and missing_tool.go

Issue 3: Safe Output Handler Parse Function Boilerplate (11 files)

Uh oh!

pelikhan commented Feb 22, 2026

Uh oh!

Copilot AI commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 22, 2026 •

edited

Loading

Copilot AI commented Feb 22, 2026 •

edited

Loading