fix(run_helpers): reduce false positives in edit intent inference#13
fix(run_helpers): reduce false positives in edit intent inference#13RyderFreeman4Logos wants to merge 3 commits intomainfrom
Conversation
[MOTIVATION] infer_task_edit_requirement() used broad substring matching that caused false positives: "prefix the" matched "fix the", "explain the implementation" matched "implement", and "check for updates" matched "update". Prompts starting with emoji/bullets produced empty tokens that broke verb detection. [IMPLEMENTATION DETAILS] - Rewrote to token-based matching: tokenize once, all rules use tokens - Filter empty tokens from punctuation-only input (emoji/bullet prefixes) - Phase 0: explicit read-only substring markers (unchanged) - Phase 1: unambiguous multi-word token sequences (write+code, create+file) - Phase 2: "fix" + determiner (the/a/this/that/my/our) as token pair - Phase 3: standalone word markers (refactor, rename, modify, patch) - Phase 4: ambiguous verbs as first meaningful word, skipping polite prefixes (please, can, could, would, you) - 19 tests covering false positives, polite prefixes, emoji/bullet prefixes, and all existing behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 342f057523
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Phase 4: Ambiguous verbs — only match as the first meaningful word. | ||
| // Skips polite prefixes like "please", "can you", "could you". | ||
| let skip_prefixes: &[&str] = &["please", "can", "could", "would", "you"]; | ||
| let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied(); | ||
| let verb_markers = ["edit", "update", "implement", "fix"]; |
There was a problem hiding this comment.
Handle filler words before verb detection
Because Phase 4 only inspects the first non-skip token, prompts like “Please also update the config” or “Please just implement X” will return None (the first non-skip token is “also/just”), even though the edit intent is clear. This is a regression from the previous substring matching and can mislead failover decisions in main.rs (where infer_task_edit_requirement feeds decide_failover), potentially choosing a read-only tool for an edit request. Consider skipping common filler adverbs or scanning for the first verb marker after polite prefixes rather than a single token.
Useful? React with 👍 / 👎.
[MOTIVATION] Codex bot review (PR #13) found that filler adverbs like "also", "just" blocked first-verb detection: "Please also update the config" returned None instead of Some(true). [IMPLEMENTATION DETAILS] - Added filler adverbs (also, just, now, then, quickly) to skip list - Added 2 regression tests for filler word scenarios - 21 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e88f4fe923
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let skip_prefixes: &[&str] = &[ | ||
| "please", "can", "could", "would", "you", // polite prefixes | ||
| "also", "just", "now", "then", "quickly", // filler adverbs | ||
| ]; | ||
| let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied(); |
There was a problem hiding this comment.
Treat edit intent as ambiguous for common lead-ins
Because phase 4 only checks the first non-skipped token, prompts like “We need to update the config” or “Should we implement X?” now return None: first_verb becomes we/should (not in skip_prefixes), so update/implement are never examined. This is a regression from the prior substring matcher and can misclassify edit tasks as ambiguous; in main.rs that None falls back to can_tool_edit_existing for the current tool, which can be false for read-only tools and drive failover decisions that keep non-edit tools for an edit-required task.
Useful? React with 👍 / 👎.
[MOTIVATION] Prompts like "We need to update the config" or "Should we implement X?" were returning None because pronouns and modal verbs (we, should, need, to) were not in the skip_prefixes list, causing the first non-skipped token to be a non-verb word instead of the actual edit-intent verb. [IMPLEMENTATION DETAILS] - Added pronouns (we, i), modal (should, shall), lead-ins (need, to, go, let, lets, us) to skip_prefixes - Added 4 regression tests (25 total): we need to update, should we implement, let's fix, I need to edit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76c84073e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Phase 4: Ambiguous verbs — only match as the first meaningful word. | ||
| // Skips polite prefixes and filler adverbs before the verb. | ||
| let skip_prefixes: &[&str] = &[ | ||
| "please", "can", "could", "would", "should", "shall", "you", // modals & polite | ||
| "we", "i", "lets", "let", "us", "need", "to", "go", // pronouns & lead-ins | ||
| "also", "just", "now", "then", "quickly", // filler adverbs | ||
| ]; | ||
| let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied(); | ||
| let verb_markers = ["edit", "update", "implement", "fix"]; | ||
| if let Some(verb) = first_verb { | ||
| if verb_markers.contains(&verb) { | ||
| return Some(true); |
There was a problem hiding this comment.
Handle edit verbs that appear after a subject
Phase 4 only checks the first non‑prefix token for edit|update|implement|fix, so sentences where the verb appears later (e.g., "The task is to update the config" or "Review the logs and update the config") now return None. That’s a regression from the previous substring check and can cause failover to treat an edit request as read‑only when the current tool can’t edit, since infer_task_edit_requirement is used to decide edit capability. Consider scanning for these verbs beyond just the first meaningful token, or expanding the prefix list to cover common subjects so explicit edit intent isn’t missed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
By design, not a bug. The function returns Option<bool> — None means "ambiguous, let caller decide".
Scanning all tokens for verb markers would reintroduce the original false-positive problem (e.g., "check for updates" → matches "update", "explain the implementation" → matches "implement"). Phase 4 intentionally only checks the first meaningful verb to avoid this.
For "Review the logs and update the config" — mixed-intent prompts are explicitly tested as None (infer_edit_review_and_fix_is_ambiguous). The caller in main.rs uses can_tool_edit_existing as a secondary signal when None is returned.
For "The task is to update the config" — this phrasing is an extreme edge case. Adding the, task, is etc. to skip_prefixes would effectively turn this into a full NLP parser, which is outside the scope of this lightweight heuristic.
|
Superseded by #14 (clean squash resubmission after iterative review) |
…pability [MOTIVATION] Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to hang waiting for stdin input (Issue #13); (2) all prompts passed via argv, risking ARG_MAX overflow for large prompts (Issue #12). [IMPLEMENTATION DETAILS] - Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability declarations: codex/gemini/claude support both, opencode argv-only - Modify build_command() to return (Command, Option<Vec<u8>>) — selects stdin transport when prompt > 100KB and tool supports it - Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data with 30s timeout when Some, uses Stdio::null() when None - Update pipeline.rs to thread stdin_data through to spawn_tool() - Add comprehensive tests: transport capability matrix, stdin null/piped behavior, long prompt routing, opencode argv-only warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…AUTO fallback (#4,#12,#13) (#64) * fix(process,executor): unified stdin strategy with PromptTransport capability [MOTIVATION] Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to hang waiting for stdin input (Issue #13); (2) all prompts passed via argv, risking ARG_MAX overflow for large prompts (Issue #12). [IMPLEMENTATION DETAILS] - Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability declarations: codex/gemini/claude support both, opencode argv-only - Modify build_command() to return (Command, Option<Vec<u8>>) — selects stdin transport when prompt > 100KB and tool supports it - Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data with 30s timeout when Some, uses Stdio::null() when None - Update pipeline.rs to thread stdin_data through to spawn_tool() - Add comprehensive tests: transport capability matrix, stdin null/piped behavior, long prompt routing, opencode argv-only warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(config): add defaults.tool with resolve_tool() for AUTO fallback (#4) [MOTIVATION] When csa review/debate is called outside a CSA session (e.g., directly from Claude Code), auto tool detection fails because CSA_TOOL and CSA_PARENT_TOOL env vars are absent. This causes "AUTO review tool selection failed" with no graceful fallback path. [IMPLEMENTATION DETAILS] - Add `tool: Option<String>` to `DefaultsConfig` in GlobalConfig - Add `resolve_tool()` in run_helpers.rs: detection → config → None - Update review_cmd, debate_cmd, main.rs to use resolve_tool() - Two-level fallback: detect_parent_tool() → config.defaults.tool → any-available - detect_parent_tool() stays unchanged (pure detection only) - Add 5 tests covering resolve_tool logic and config deserialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(skills): clarify mktd approve option with mktsk reference [MOTIVATION] The Phase 4 APPROVE option in the mktd skill did not clearly indicate that the next step is to use the mktsk skill for execution, causing potential confusion about the post-approval workflow. [IMPLEMENTATION DETAILS] - Changed "Approve" to "Approve, 使用 mktsk" in Phase 4 User Options Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(review): address P1 doc language + P2 test gap from local review [MOTIVATION] Pre-PR local review (main...HEAD) identified two issues: - P1: Chinese text in mktd SKILL.md violates Meta 003 communication rule - P2: Missing long-prompt stdin transport tests for Gemini CLI and Claude Code [IMPLEMENTATION DETAILS] - Changed mktd approve option label to English - Added long-prompt stdin tests for Gemini CLI and Claude Code executors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(executor): retain -p flag for gemini/claude in stdin transport mode [MOTIVATION] Local review P1: when long prompts go through stdin transport, the -p flag was entirely skipped for Gemini CLI and Claude Code. Without -p, these tools enter interactive mode, causing automation to hang. [IMPLEMENTATION DETAILS] - In stdin mode, still emit -p (without prompt arg) for gemini/claude - Codex/opencode don't need extra flags for stdin reading - Updated tests to assert -p presence in stdin transport mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
infer_task_edit_requirement()from substring matching to token-based matchingBackground
Clean resubmission of #12. The original PR went through 1 round of iterative review with @codex (P2: empty token bug). Fix commits have been consolidated into a single logical commit here.
See #12 for the full review discussion.
Test plan
cargo clippy -p cli-sub-agent— cleancargo test -p cli-sub-agent -- infer_edit— 19/19 passedjust test— all passedcsa review --branch main— reviewed🤖 Generated with Claude Code