Add fuzzy matching "did you mean" suggestions for engine and tool validation#14260
Add fuzzy matching "did you mean" suggestions for engine and tool validation#14260
Conversation
- Implement Levenshtein distance algorithm in pkg/stringutil - Add FindClosestMatch function for typo suggestions - Update validateEngine to provide "did you mean" suggestions - Add comprehensive tests for fuzzy matching - Add tests for engine validation "did you mean" feature Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Update ValidateGitHubToolsAgainstToolsets to detect unknown tools - Provide fuzzy matching suggestions for typos in tool names - Add formatList helper for natural language lists - Add comprehensive tests for GitHub tool typo suggestions - Update existing tests to expect errors for unknown tools Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Update validateEngine in cmd/gh-aw/main.go to use fuzzy matching - Provide typo suggestions when invalid engine flag is specified - Maintains backward compatible error format - All tests passing Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Skip wildcard "*" in GitHub tool validation (used for "allow all") - Add nolint comments for intentional test typos in stringutil tests - All tests now passing including threat detection isolation test Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
pkg/stringutil/stringutil.go
Outdated
| // LevenshteinDistance("claude", "claue") // Returns: 1 (one deletion) | ||
| // LevenshteinDistance("codex", "codec") // Returns: 1 (one substitution) | ||
| // LevenshteinDistance("abc", "xyz") // Returns: 3 (three substitutions) | ||
| func LevenshteinDistance(s1, s2 string) int { |
There was a problem hiding this comment.
@copilot there is already fuzzy matching search for it
There was a problem hiding this comment.
Good catch! I've updated the code to use the existing parser.FindClosestMatches and parser.LevenshteinDistance functions instead of creating duplicate implementations. The commit 0a1b81e removes the duplicate code from pkg/stringutil and updates all validation code to use the existing parser functions.
…ation - Remove duplicate LevenshteinDistance and FindClosestMatch from pkg/stringutil - Update engine validation to use parser.FindClosestMatches - Update GitHub tool validation to use parser.FindClosestMatches - Update CLI validateEngine to use parser.FindClosestMatches - Fix test expectation for case-insensitive matching behavior - All tests passing with existing fuzzy matching infrastructure Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
pkg/workflow/engine_validation.go
Outdated
| return fmt.Errorf("invalid engine: %s. Valid engines are: copilot, claude, codex, custom.\n\nExample:\nengine: copilot\n\nSee: %s", engineID, constants.DocsEnginesURL) | ||
|
|
||
| // Build list of valid engine IDs | ||
| validEngines := []string{"copilot", "claude", "codex", "custom"} |
There was a problem hiding this comment.
Updated to get the list dynamically from the engine registry. The commit df362cf changes both pkg/workflow/engine_validation.go (uses c.engineRegistry.GetSupportedEngines()) and cmd/gh-aw/main.go (uses workflow.GetGlobalEngineRegistry().GetSupportedEngines()) to pull the valid engine list from the registry instead of hardcoding it.
- Update validateEngine in pkg/workflow/engine_validation.go to use c.engineRegistry.GetSupportedEngines() - Update validateEngine in cmd/gh-aw/main.go to use workflow.GetGlobalEngineRegistry() - Dynamically generate engine list and error messages from registry - Ensures error messages stay in sync with registered engines - All tests passing Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Improves gh-aw compiler/CLI validation error messages by adding fuzzy “did you mean” suggestions for mistyped engine IDs and GitHub tool names, leveraging the existing Levenshtein-based matcher in pkg/parser.
Changes:
- Added “did you mean” suggestions for invalid engine values in both compiler validation and the CLI flag validator, using the engine registry for the valid engine list.
- Changed GitHub tool validation to treat unknown tools as validation errors and provide typo suggestions plus examples of valid tools.
- Added/updated unit tests to cover typo suggestions for engines and GitHub tools.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/workflow/engine_validation.go |
Builds richer invalid-engine errors using registry-provided valid engine IDs + fuzzy suggestions. |
cmd/gh-aw/main.go |
Updates CLI --engine validation to use the global engine registry + fuzzy suggestions. |
pkg/workflow/github_tool_to_toolset.go |
Adds unknown-tool detection with “did you mean” suggestions and a more verbose error message. |
pkg/workflow/github_tool_to_toolset_test.go |
Updates existing tool validation expectations and adds new tests for suggestions. |
pkg/workflow/engine_validation_test.go |
Adds tests validating “did you mean” behavior for invalid engine IDs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Get all valid tool names for suggestion | ||
| validTools := make([]string, 0, len(GitHubToolToToolsetMap)) | ||
| for validTool := range GitHubToolToToolsetMap { | ||
| validTools = append(validTools, validTool) | ||
| } | ||
| sort.Strings(validTools) |
There was a problem hiding this comment.
validTools is rebuilt and sorted for every unknown tool encountered, and then rebuilt again later for the final error message. Consider computing the sorted validTools slice once outside the loop and reusing it for both suggestion lookups and the “Valid GitHub tools include …” section to avoid unnecessary work.
| unknownTools = append(unknownTools, tool) | ||
| } | ||
| // Tool not in our mapping - this could be a new tool or a typo | ||
| // We'll skip validation for unknown tools to avoid false positives |
There was a problem hiding this comment.
The comment says unknown tools are skipped “to avoid false positives”, but the function now returns an error when unknownTools is non-empty. Update or remove this comment so it matches the new behavior (unknown tools are now treated as validation failures).
| // We'll skip validation for unknown tools to avoid false positives | |
| // Record the unknown tool and skip further checks for this entry; unknown tools will cause validation to fail later. |
| if len(items) == 0 { | ||
| return "" | ||
| } | ||
| if len(items) == 1 { | ||
| return items[0] | ||
| } | ||
| if len(items) == 2 { | ||
| return items[0] + " and " + items[1] | ||
| } | ||
| return fmt.Sprintf("%s, and %s", formatList(items[:len(items)-1]), items[len(items)-1]) |
There was a problem hiding this comment.
formatList produces awkward/incorrect grammar for 3+ items (e.g., "a and b, and c" instead of "a, b, and c") due to the recursive construction. This will make the new error messages harder to read when listing multiple tools. Consider implementing proper comma-separated formatting (e.g., join all but last with ", ", then add ", and ").
| if len(items) == 0 { | |
| return "" | |
| } | |
| if len(items) == 1 { | |
| return items[0] | |
| } | |
| if len(items) == 2 { | |
| return items[0] + " and " + items[1] | |
| } | |
| return fmt.Sprintf("%s, and %s", formatList(items[:len(items)-1]), items[len(items)-1]) | |
| switch len(items) { | |
| case 0: | |
| return "" | |
| case 1: | |
| return items[0] | |
| case 2: | |
| return items[0] + " and " + items[1] | |
| default: | |
| result := items[0] | |
| for i := 1; i < len(items); i++ { | |
| if i == len(items)-1 { | |
| result += ", and " + items[i] | |
| } else { | |
| result += ", " + items[i] | |
| } | |
| } | |
| return result | |
| } |
| name: "Unknown tool is ignored", | ||
| allowedTools: []string{"get_repository", "unknown_tool_xyz"}, | ||
| enabledToolsets: []string{"repos"}, | ||
| expectError: false, | ||
| expectError: true, | ||
| errorContains: []string{"Unknown GitHub tool", "unknown_tool_xyz"}, |
There was a problem hiding this comment.
This test case name no longer matches the asserted behavior (it now expects an error). Rename it to reflect the new semantics (e.g., "Unknown tool returns error"), so failures are easier to interpret.
Error Message Quality Improvements
Implementing improvements to enhance error message quality in the gh-aw compiler based on the Daily Syntax Error Quality Check analysis (2026-02-07).
Summary
Successfully implemented "did you mean" suggestions across the codebase to help users quickly fix typos in engine names and GitHub tool names. This reduces error resolution time from ~2 minutes (looking up valid values) to ~10 seconds (accepting suggestion).
Updates:
parser.FindClosestMatchesandparser.LevenshteinDistancefunctions instead of creating duplicate implementationsEngineRegistryinstead of hardcodingImplementation Complete
High Priority ✅
Implement Levenshtein distance calculationUse existing implementation inpkg/parserImplementUse existingFindClosestMatchfunctionparser.FindClosestMatchesAdd comprehensive testsExisting tests in parser packagevalidateEngineinpkg/workflow/engine_validation.goto use engine registryvalidateEngineincmd/gh-aw/main.goto use engine registryValidateGitHubToolsAgainstToolsetsto detect unknown toolsChanges Made
Used existing fuzzy matching infrastructure in
pkg/parser/schema_suggestions.go:parser.FindClosestMatches(target, candidates, maxResults)- finds up to N closest matchesparser.LevenshteinDistance(a, b)- calculates edit distanceEngine validation uses engine registry dynamically:
pkg/workflow/engine_validation.go: Usesc.engineRegistry.GetSupportedEngines()cmd/gh-aw/main.go: Usesworkflow.GetGlobalEngineRegistry().GetSupportedEngines()GitHub tool validation (
pkg/workflow/github_tool_to_toolset.go):Test Results
Automated Testing:
Example Error Messages
Engine typo:
GitHub tool typos:
Impact
Original prompt
This section details on the original issue you should resolve
<issue_title>[syntax-error-quality] Syntax Error Quality Analysis - 2026-02-07</issue_title>
<issue_description>### 📊 Error Message Quality Analysis
Analysis Date: 2026-02-07
Test Cases: 3
Average Score: 71.3/100
Status: ✅ Good
Executive Summary
This analysis evaluates the gh-aw compiler's error message quality across three different error scenarios: YAML syntax errors, invalid engine names, and conflicting configurations. The compiler demonstrates generally good error messaging with clear file:line:column formatting and actionable context, achieving an average score of 71.3/100, which meets the quality threshold of ≥70.
Key Findings:
Test Case Results
Test Case 1: Invalid YAML Syntax (Missing Colon) - Score: 78/100 ✅
Test Configuration
Workflow:
example-custom-error-patterns.md(33 lines - simple workflow)Error Type: Category A - Invalid YAML syntax
Error Introduced: Line 10:
engine:changed toengine(missing colon after key)Compiler Output Analysis
Based on code review of
pkg/parser/frontmatter_content.goandpkg/workflow/frontmatter_error.go, the compiler:yaml.FormatError()for colorized, source-positioned error outputconsole.FormatError()Expected Output Format: