fix(run_helpers): reduce false positives in edit intent inference by RyderFreeman4Logos · Pull Request #13 · RyderFreeman4Logos/cli-sub-agent

RyderFreeman4Logos · 2026-02-09T02:17:02Z

Summary

Rewrote infer_task_edit_requirement() from substring matching to token-based matching
Fixes false positives: "prefix the" no longer matches "fix the", "explain the implementation" no longer matches "implement"
Filters empty tokens from emoji/bullet prefixes (e.g. "✅ Fix the bug")
Adds polite prefix skipping (please, can you, could you) for ambiguous verbs
19 tests covering false positives, polite prefixes, emoji/bullet prefixes, and all existing behavior

Background

Clean resubmission of #12. The original PR went through 1 round of iterative review with @codex (P2: empty token bug). Fix commits have been consolidated into a single logical commit here.

See #12 for the full review discussion.

Test plan

cargo clippy -p cli-sub-agent — clean
cargo test -p cli-sub-agent -- infer_edit — 19/19 passed
just test — all passed
csa review --branch main — reviewed
@codex review

🤖 Generated with Claude Code

[MOTIVATION] infer_task_edit_requirement() used broad substring matching that caused false positives: "prefix the" matched "fix the", "explain the implementation" matched "implement", and "check for updates" matched "update". Prompts starting with emoji/bullets produced empty tokens that broke verb detection. [IMPLEMENTATION DETAILS] - Rewrote to token-based matching: tokenize once, all rules use tokens - Filter empty tokens from punctuation-only input (emoji/bullet prefixes) - Phase 0: explicit read-only substring markers (unchanged) - Phase 1: unambiguous multi-word token sequences (write+code, create+file) - Phase 2: "fix" + determiner (the/a/this/that/my/our) as token pair - Phase 3: standalone word markers (refactor, rename, modify, patch) - Phase 4: ambiguous verbs as first meaningful word, skipping polite prefixes (please, can, could, would, you) - 19 tests covering false positives, polite prefixes, emoji/bullet prefixes, and all existing behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RyderFreeman4Logos · 2026-02-09T02:17:12Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 342f057523

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-09T02:19:10Z

crates/cli-sub-agent/src/run_helpers.rs

+    // Phase 4: Ambiguous verbs — only match as the first meaningful word.
+    // Skips polite prefixes like "please", "can you", "could you".
+    let skip_prefixes: &[&str] = &["please", "can", "could", "would", "you"];
+    let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied();
+    let verb_markers = ["edit", "update", "implement", "fix"];


Handle filler words before verb detection

Because Phase 4 only inspects the first non-skip token, prompts like “Please also update the config” or “Please just implement X” will return None (the first non-skip token is “also/just”), even though the edit intent is clear. This is a regression from the previous substring matching and can mislead failover decisions in main.rs (where infer_task_edit_requirement feeds decide_failover), potentially choosing a read-only tool for an edit request. Consider skipping common filler adverbs or scanning for the first verb marker after polite prefixes rather than a single token.

Useful? React with 👍 / 👎.

[MOTIVATION] Codex bot review (PR #13) found that filler adverbs like "also", "just" blocked first-verb detection: "Please also update the config" returned None instead of Some(true). [IMPLEMENTATION DETAILS] - Added filler adverbs (also, just, now, then, quickly) to skip list - Added 2 regression tests for filler word scenarios - 21 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RyderFreeman4Logos · 2026-02-09T02:20:53Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e88f4fe923

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-09T02:22:55Z

crates/cli-sub-agent/src/run_helpers.rs

+    let skip_prefixes: &[&str] = &[
+        "please", "can", "could", "would", "you", // polite prefixes
+        "also", "just", "now", "then", "quickly", // filler adverbs
+    ];
+    let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied();


Treat edit intent as ambiguous for common lead-ins

Because phase 4 only checks the first non-skipped token, prompts like “We need to update the config” or “Should we implement X?” now return None: first_verb becomes we/should (not in skip_prefixes), so update/implement are never examined. This is a regression from the prior substring matcher and can misclassify edit tasks as ambiguous; in main.rs that None falls back to can_tool_edit_existing for the current tool, which can be false for read-only tools and drive failover decisions that keep non-edit tools for an edit-required task.

Useful? React with 👍 / 👎.

[MOTIVATION] Prompts like "We need to update the config" or "Should we implement X?" were returning None because pronouns and modal verbs (we, should, need, to) were not in the skip_prefixes list, causing the first non-skipped token to be a non-verb word instead of the actual edit-intent verb. [IMPLEMENTATION DETAILS] - Added pronouns (we, i), modal (should, shall), lead-ins (need, to, go, let, lets, us) to skip_prefixes - Added 4 regression tests (25 total): we need to update, should we implement, let's fix, I need to edit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RyderFreeman4Logos · 2026-02-09T02:25:37Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76c84073e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-09T02:27:37Z

crates/cli-sub-agent/src/run_helpers.rs

+    // Phase 4: Ambiguous verbs — only match as the first meaningful word.
+    // Skips polite prefixes and filler adverbs before the verb.
+    let skip_prefixes: &[&str] = &[
+        "please", "can", "could", "would", "should", "shall", "you", // modals & polite
+        "we", "i", "lets", "let", "us", "need", "to", "go", // pronouns & lead-ins
+        "also", "just", "now", "then", "quickly", // filler adverbs
+    ];
+    let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied();
+    let verb_markers = ["edit", "update", "implement", "fix"];
+    if let Some(verb) = first_verb {
+        if verb_markers.contains(&verb) {
+            return Some(true);


Handle edit verbs that appear after a subject

Phase 4 only checks the first non‑prefix token for edit|update|implement|fix, so sentences where the verb appears later (e.g., "The task is to update the config" or "Review the logs and update the config") now return None. That’s a regression from the previous substring check and can cause failover to treat an edit request as read‑only when the current tool can’t edit, since infer_task_edit_requirement is used to decide edit capability. Consider scanning for these verbs beyond just the first meaningful token, or expanding the prefix list to cover common subjects so explicit edit intent isn’t missed.

Useful? React with 👍 / 👎.

By design, not a bug. The function returns Option<bool> — None means "ambiguous, let caller decide".

Scanning all tokens for verb markers would reintroduce the original false-positive problem (e.g., "check for updates" → matches "update", "explain the implementation" → matches "implement"). Phase 4 intentionally only checks the first meaningful verb to avoid this.

For "Review the logs and update the config" — mixed-intent prompts are explicitly tested as None (infer_edit_review_and_fix_is_ambiguous). The caller in main.rs uses can_tool_edit_existing as a secondary signal when None is returned.

For "The task is to update the config" — this phrasing is an extreme edge case. Adding the, task, is etc. to skip_prefixes would effectively turn this into a full NLP parser, which is outside the scope of this lightweight heuristic.

RyderFreeman4Logos · 2026-02-09T02:30:14Z

Superseded by #14 (clean squash resubmission after iterative review)

…pability [MOTIVATION] Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to hang waiting for stdin input (Issue #13); (2) all prompts passed via argv, risking ARG_MAX overflow for large prompts (Issue #12). [IMPLEMENTATION DETAILS] - Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability declarations: codex/gemini/claude support both, opencode argv-only - Modify build_command() to return (Command, Option<Vec<u8>>) — selects stdin transport when prompt > 100KB and tool supports it - Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data with 30s timeout when Some, uses Stdio::null() when None - Update pipeline.rs to thread stdin_data through to spawn_tool() - Add comprehensive tests: transport capability matrix, stdin null/piped behavior, long prompt routing, opencode argv-only warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…AUTO fallback (#4,#12,#13) (#64) * fix(process,executor): unified stdin strategy with PromptTransport capability [MOTIVATION] Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to hang waiting for stdin input (Issue #13); (2) all prompts passed via argv, risking ARG_MAX overflow for large prompts (Issue #12). [IMPLEMENTATION DETAILS] - Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability declarations: codex/gemini/claude support both, opencode argv-only - Modify build_command() to return (Command, Option<Vec<u8>>) — selects stdin transport when prompt > 100KB and tool supports it - Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data with 30s timeout when Some, uses Stdio::null() when None - Update pipeline.rs to thread stdin_data through to spawn_tool() - Add comprehensive tests: transport capability matrix, stdin null/piped behavior, long prompt routing, opencode argv-only warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(config): add defaults.tool with resolve_tool() for AUTO fallback (#4) [MOTIVATION] When csa review/debate is called outside a CSA session (e.g., directly from Claude Code), auto tool detection fails because CSA_TOOL and CSA_PARENT_TOOL env vars are absent. This causes "AUTO review tool selection failed" with no graceful fallback path. [IMPLEMENTATION DETAILS] - Add `tool: Option<String>` to `DefaultsConfig` in GlobalConfig - Add `resolve_tool()` in run_helpers.rs: detection → config → None - Update review_cmd, debate_cmd, main.rs to use resolve_tool() - Two-level fallback: detect_parent_tool() → config.defaults.tool → any-available - detect_parent_tool() stays unchanged (pure detection only) - Add 5 tests covering resolve_tool logic and config deserialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(skills): clarify mktd approve option with mktsk reference [MOTIVATION] The Phase 4 APPROVE option in the mktd skill did not clearly indicate that the next step is to use the mktsk skill for execution, causing potential confusion about the post-approval workflow. [IMPLEMENTATION DETAILS] - Changed "Approve" to "Approve, 使用 mktsk" in Phase 4 User Options Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(review): address P1 doc language + P2 test gap from local review [MOTIVATION] Pre-PR local review (main...HEAD) identified two issues: - P1: Chinese text in mktd SKILL.md violates Meta 003 communication rule - P2: Missing long-prompt stdin transport tests for Gemini CLI and Claude Code [IMPLEMENTATION DETAILS] - Changed mktd approve option label to English - Added long-prompt stdin tests for Gemini CLI and Claude Code executors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(executor): retain -p flag for gemini/claude in stdin transport mode [MOTIVATION] Local review P1: when long prompts go through stdin transport, the -p flag was entirely skipped for Gemini CLI and Claude Code. Without -p, these tools enter interactive mode, causing automation to hang. [IMPLEMENTATION DETAILS] - In stdin mode, still emit -p (without prompt arg) for gemini/claude - Codex/opencode don't need extra flags for stdin reading - Updated tests to assert -p presence in stdin transport mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

RyderFreeman4Logos mentioned this pull request Feb 9, 2026

fix(run_helpers): reduce false positives in edit intent inference #12

Closed

5 tasks

chatgpt-codex-connector bot reviewed Feb 9, 2026

View reviewed changes

RyderFreeman4Logos mentioned this pull request Feb 9, 2026

fix(run_helpers): reduce false positives in edit intent inference #14

Closed

3 tasks

RyderFreeman4Logos closed this Feb 9, 2026

RyderFreeman4Logos mentioned this pull request Feb 13, 2026

fix(process,executor): unified stdin strategy with PromptTransport + AUTO fallback (#4,#12,#13) #64

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(run_helpers): reduce false positives in edit intent inference#13

fix(run_helpers): reduce false positives in edit intent inference#13
RyderFreeman4Logos wants to merge 3 commits intomainfrom
fix/infer-edit-false-positives-clean

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 9, 2026

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 9, 2026

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 9, 2026

Uh oh!

RyderFreeman4Logos Feb 9, 2026

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyderFreeman4Logos commented Feb 9, 2026

Summary

Background

Test plan

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

RyderFreeman4Logos Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

RyderFreeman4Logos commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant