fix(run_helpers): reduce false positives in edit intent inference#12
fix(run_helpers): reduce false positives in edit intent inference#12RyderFreeman4Logos wants to merge 2 commits intomainfrom
Conversation
[MOTIVATION] infer_task_edit_requirement() used broad substring matching that caused false positives: "prefix the" matched "fix the", "explain the implementation" matched "implement", and "check for updates" matched "update". [IMPLEMENTATION DETAILS] - Rewrote to token-based matching: tokenize once, all rules use tokens - Phase 0: explicit read-only substring markers (unchanged) - Phase 1: unambiguous multi-word token sequences (write+code, create+file) - Phase 2: "fix" + determiner (the/a/this/that/my/our) as token pair - Phase 3: standalone word markers (refactor, rename, modify, patch) - Phase 4: ambiguous verbs as first meaningful word, skipping polite prefixes (please, can, could, would, you) - 17 tests covering false positives (prefix/suffix), polite prefixes, and all existing behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e96eb85d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let tokens: Vec<&str> = prompt_lower | ||
| .split_whitespace() | ||
| .map(|w| w.trim_matches(|c: char| !c.is_alphanumeric())) | ||
| .collect(); |
There was a problem hiding this comment.
Filter empty tokens before first-verb detection
If the prompt begins with punctuation-only tokens (e.g., emoji/bullets like "✅ Fix the bug"), trim_matches turns the first token into an empty string. Because empties aren’t filtered, first_verb can become "", so Phase 4 never considers the real verb and the function returns None instead of Some(true). This is a regression from the previous substring approach for prompts that start with non-alphanumeric markers.
Useful? React with 👍 / 👎.
[MOTIVATION] Codex bot review (PR #12) found that prompts starting with emoji or bullets (e.g. "✅ Fix the bug") produce empty tokens after trim_matches, causing first-verb detection to fail silently. [IMPLEMENTATION DETAILS] - Added .filter(|w| !w.is_empty()) to token pipeline - Added 2 regression tests for emoji and bullet prefixes - 19 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2cf597c98
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let skip_prefixes: &[&str] = &["please", "can", "could", "would", "you"]; | ||
| let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied(); | ||
| let verb_markers = ["edit", "update", "implement", "fix"]; | ||
| if let Some(verb) = first_verb { | ||
| if verb_markers.contains(&verb) { |
There was a problem hiding this comment.
Handle auxiliary verbs before edit markers
In Phase 4 the code only inspects the first non-skip token to decide whether an edit verb is present. For prompts like “You should update the config,” the first non-skip token becomes should, so update is never examined and the function returns None. This is a regression from the prior substring behavior and will miss common edit-intent phrasing that includes auxiliary verbs ("should", "need to", etc.), leading to false negatives in edit inference for such inputs.
Useful? React with 👍 / 👎.
|
Superseded by #13. Preserved for review discussion reference. |
…pability [MOTIVATION] Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to hang waiting for stdin input (Issue #13); (2) all prompts passed via argv, risking ARG_MAX overflow for large prompts (Issue #12). [IMPLEMENTATION DETAILS] - Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability declarations: codex/gemini/claude support both, opencode argv-only - Modify build_command() to return (Command, Option<Vec<u8>>) — selects stdin transport when prompt > 100KB and tool supports it - Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data with 30s timeout when Some, uses Stdio::null() when None - Update pipeline.rs to thread stdin_data through to spawn_tool() - Add comprehensive tests: transport capability matrix, stdin null/piped behavior, long prompt routing, opencode argv-only warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…AUTO fallback (#4,#12,#13) (#64) * fix(process,executor): unified stdin strategy with PromptTransport capability [MOTIVATION] Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to hang waiting for stdin input (Issue #13); (2) all prompts passed via argv, risking ARG_MAX overflow for large prompts (Issue #12). [IMPLEMENTATION DETAILS] - Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability declarations: codex/gemini/claude support both, opencode argv-only - Modify build_command() to return (Command, Option<Vec<u8>>) — selects stdin transport when prompt > 100KB and tool supports it - Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data with 30s timeout when Some, uses Stdio::null() when None - Update pipeline.rs to thread stdin_data through to spawn_tool() - Add comprehensive tests: transport capability matrix, stdin null/piped behavior, long prompt routing, opencode argv-only warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(config): add defaults.tool with resolve_tool() for AUTO fallback (#4) [MOTIVATION] When csa review/debate is called outside a CSA session (e.g., directly from Claude Code), auto tool detection fails because CSA_TOOL and CSA_PARENT_TOOL env vars are absent. This causes "AUTO review tool selection failed" with no graceful fallback path. [IMPLEMENTATION DETAILS] - Add `tool: Option<String>` to `DefaultsConfig` in GlobalConfig - Add `resolve_tool()` in run_helpers.rs: detection → config → None - Update review_cmd, debate_cmd, main.rs to use resolve_tool() - Two-level fallback: detect_parent_tool() → config.defaults.tool → any-available - detect_parent_tool() stays unchanged (pure detection only) - Add 5 tests covering resolve_tool logic and config deserialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(skills): clarify mktd approve option with mktsk reference [MOTIVATION] The Phase 4 APPROVE option in the mktd skill did not clearly indicate that the next step is to use the mktsk skill for execution, causing potential confusion about the post-approval workflow. [IMPLEMENTATION DETAILS] - Changed "Approve" to "Approve, 使用 mktsk" in Phase 4 User Options Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(review): address P1 doc language + P2 test gap from local review [MOTIVATION] Pre-PR local review (main...HEAD) identified two issues: - P1: Chinese text in mktd SKILL.md violates Meta 003 communication rule - P2: Missing long-prompt stdin transport tests for Gemini CLI and Claude Code [IMPLEMENTATION DETAILS] - Changed mktd approve option label to English - Added long-prompt stdin tests for Gemini CLI and Claude Code executors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(executor): retain -p flag for gemini/claude in stdin transport mode [MOTIVATION] Local review P1: when long prompts go through stdin transport, the -p flag was entirely skipped for Gemini CLI and Claude Code. Without -p, these tools enter interactive mode, causing automation to hang. [IMPLEMENTATION DETAILS] - In stdin mode, still emit -p (without prompt arg) for gemini/claude - Codex/opencode don't need extra flags for stdin reading - Updated tests to assert -p presence in stdin transport mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
infer_task_edit_requirement()from substring matching to token-based matchingTest plan
cargo clippy -p cli-sub-agent— cleancargo test -p cli-sub-agent -- infer_edit— 17/17 passedjust test— 203/203 passedcsa review --branch main— reviewed, fixes applied🤖 Generated with Claude Code