Skip to content

fix(run_helpers): reduce false positives in edit intent inference#12

Closed
RyderFreeman4Logos wants to merge 2 commits intomainfrom
fix/infer-edit-false-positives
Closed

fix(run_helpers): reduce false positives in edit intent inference#12
RyderFreeman4Logos wants to merge 2 commits intomainfrom
fix/infer-edit-false-positives

Conversation

@RyderFreeman4Logos
Copy link
Owner

Summary

  • Rewrote infer_task_edit_requirement() from substring matching to token-based matching
  • Fixes false positives: "prefix the" no longer matches "fix the", "explain the implementation" no longer matches "implement"
  • Adds polite prefix skipping (please, can you, could you) for ambiguous verbs
  • 17 tests covering false positives, polite prefixes, and all existing behavior

Test plan

  • cargo clippy -p cli-sub-agent — clean
  • cargo test -p cli-sub-agent -- infer_edit — 17/17 passed
  • just test — 203/203 passed
  • csa review --branch main — reviewed, fixes applied
  • @codex review

🤖 Generated with Claude Code

[MOTIVATION]
infer_task_edit_requirement() used broad substring matching that caused
false positives: "prefix the" matched "fix the", "explain the implementation"
matched "implement", and "check for updates" matched "update".

[IMPLEMENTATION DETAILS]
- Rewrote to token-based matching: tokenize once, all rules use tokens
- Phase 0: explicit read-only substring markers (unchanged)
- Phase 1: unambiguous multi-word token sequences (write+code, create+file)
- Phase 2: "fix" + determiner (the/a/this/that/my/our) as token pair
- Phase 3: standalone word markers (refactor, rename, modify, patch)
- Phase 4: ambiguous verbs as first meaningful word, skipping polite
  prefixes (please, can, could, would, you)
- 17 tests covering false positives (prefix/suffix), polite prefixes,
  and all existing behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RyderFreeman4Logos
Copy link
Owner Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e96eb85d0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 312 to 315
let tokens: Vec<&str> = prompt_lower
.split_whitespace()
.map(|w| w.trim_matches(|c: char| !c.is_alphanumeric()))
.collect();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter empty tokens before first-verb detection

If the prompt begins with punctuation-only tokens (e.g., emoji/bullets like "✅ Fix the bug"), trim_matches turns the first token into an empty string. Because empties aren’t filtered, first_verb can become "", so Phase 4 never considers the real verb and the function returns None instead of Some(true). This is a regression from the previous substring approach for prompts that start with non-alphanumeric markers.

Useful? React with 👍 / 👎.

[MOTIVATION]
Codex bot review (PR #12) found that prompts starting with emoji or
bullets (e.g. "✅ Fix the bug") produce empty tokens after trim_matches,
causing first-verb detection to fail silently.

[IMPLEMENTATION DETAILS]
- Added .filter(|w| !w.is_empty()) to token pipeline
- Added 2 regression tests for emoji and bullet prefixes
- 19 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RyderFreeman4Logos
Copy link
Owner Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2cf597c98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +346 to +350
let skip_prefixes: &[&str] = &["please", "can", "could", "would", "you"];
let first_verb = tokens.iter().find(|t| !skip_prefixes.contains(t)).copied();
let verb_markers = ["edit", "update", "implement", "fix"];
if let Some(verb) = first_verb {
if verb_markers.contains(&verb) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle auxiliary verbs before edit markers

In Phase 4 the code only inspects the first non-skip token to decide whether an edit verb is present. For prompts like “You should update the config,” the first non-skip token becomes should, so update is never examined and the function returns None. This is a regression from the prior substring behavior and will miss common edit-intent phrasing that includes auxiliary verbs ("should", "need to", etc.), leading to false negatives in edit inference for such inputs.

Useful? React with 👍 / 👎.

@RyderFreeman4Logos
Copy link
Owner Author

Superseded by #13. Preserved for review discussion reference.

RyderFreeman4Logos added a commit that referenced this pull request Feb 13, 2026
…pability

[MOTIVATION]
Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to
hang waiting for stdin input (Issue #13); (2) all prompts passed via argv,
risking ARG_MAX overflow for large prompts (Issue #12).

[IMPLEMENTATION DETAILS]
- Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability
  declarations: codex/gemini/claude support both, opencode argv-only
- Modify build_command() to return (Command, Option<Vec<u8>>) — selects
  stdin transport when prompt > 100KB and tool supports it
- Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data
  with 30s timeout when Some, uses Stdio::null() when None
- Update pipeline.rs to thread stdin_data through to spawn_tool()
- Add comprehensive tests: transport capability matrix, stdin null/piped
  behavior, long prompt routing, opencode argv-only warning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RyderFreeman4Logos added a commit that referenced this pull request Feb 13, 2026
…AUTO fallback (#4,#12,#13) (#64)

* fix(process,executor): unified stdin strategy with PromptTransport capability

[MOTIVATION]
Two related issues: (1) spawn_tool() didn't set stdin, causing claude-code to
hang waiting for stdin input (Issue #13); (2) all prompts passed via argv,
risking ARG_MAX overflow for large prompts (Issue #12).

[IMPLEMENTATION DETAILS]
- Add PromptTransport enum (Argv/Stdin) to csa-core with per-tool capability
  declarations: codex/gemini/claude support both, opencode argv-only
- Modify build_command() to return (Command, Option<Vec<u8>>) — selects
  stdin transport when prompt > 100KB and tool supports it
- Modify spawn_tool() to accept stdin_data: Option<Vec<u8>> — pipes data
  with 30s timeout when Some, uses Stdio::null() when None
- Update pipeline.rs to thread stdin_data through to spawn_tool()
- Add comprehensive tests: transport capability matrix, stdin null/piped
  behavior, long prompt routing, opencode argv-only warning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(config): add defaults.tool with resolve_tool() for AUTO fallback (#4)

[MOTIVATION]
When csa review/debate is called outside a CSA session (e.g., directly
from Claude Code), auto tool detection fails because CSA_TOOL and
CSA_PARENT_TOOL env vars are absent. This causes "AUTO review tool
selection failed" with no graceful fallback path.

[IMPLEMENTATION DETAILS]
- Add `tool: Option<String>` to `DefaultsConfig` in GlobalConfig
- Add `resolve_tool()` in run_helpers.rs: detection → config → None
- Update review_cmd, debate_cmd, main.rs to use resolve_tool()
- Two-level fallback: detect_parent_tool() → config.defaults.tool → any-available
- detect_parent_tool() stays unchanged (pure detection only)
- Add 5 tests covering resolve_tool logic and config deserialization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(skills): clarify mktd approve option with mktsk reference

[MOTIVATION]
The Phase 4 APPROVE option in the mktd skill did not clearly indicate
that the next step is to use the mktsk skill for execution, causing
potential confusion about the post-approval workflow.

[IMPLEMENTATION DETAILS]
- Changed "Approve" to "Approve, 使用 mktsk" in Phase 4 User Options

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(review): address P1 doc language + P2 test gap from local review

[MOTIVATION]
Pre-PR local review (main...HEAD) identified two issues:
- P1: Chinese text in mktd SKILL.md violates Meta 003 communication rule
- P2: Missing long-prompt stdin transport tests for Gemini CLI and Claude Code

[IMPLEMENTATION DETAILS]
- Changed mktd approve option label to English
- Added long-prompt stdin tests for Gemini CLI and Claude Code executors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(executor): retain -p flag for gemini/claude in stdin transport mode

[MOTIVATION]
Local review P1: when long prompts go through stdin transport, the -p
flag was entirely skipped for Gemini CLI and Claude Code. Without -p,
these tools enter interactive mode, causing automation to hang.

[IMPLEMENTATION DETAILS]
- In stdin mode, still emit -p (without prompt arg) for gemini/claude
- Codex/opencode don't need extra flags for stdin reading
- Updated tests to assert -p presence in stdin transport mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant