Skip to content

bug: model occasionally does not read enough context to correctly answer #4208

@techjoec

Description

@techjoec

TL;DR: Codex doesn't seem to properly use surrounding context especially in doc review tasks (markdown in my case) to understand file system paths (and likely more). Forcing it to consume a few lines around its target appears to fix it.

What version of Codex is running?

Happens on old and new

Which model were you using?

gpt-5-codex high reasoning but existed prior

What platform is your computer?

Linux 6.8.0-79-generic #79-Ubuntu

What steps can reproduce the bug?

When reviewing documentation more-so than user prompts Codex CLI does not consider surrounding context at all. For example, have it scan a repo for file system path updates. It goes totally dumb. If it matches "\S/\S" (anything slash anything) it has to be fought about it not being a path. I've had it create bonkers directory structures a few times but if you can't repro it I have an easy way. Tell it to scan all docs and scripts to move a project to a new folder. Then go scan the docs and scripts. Then tell it to go scan the scripts, check all the paths for sanity, and output the result to a JSON file to review.

What is the expected behavior?

Actual contextual awareness to complete the task, especially after telling the model multiple times that the output was trash and that it was catching non-file paths in its work.

What do you see instead?

No sanity checking and no contextual awareness. Returns foo/bar as a file path liberally and defends it.

Potential Fix / Current Workaround (Additional information)

I feel this is also hampering some of the "review this repo for XYZ" tasks. As a test I forced it to consume context by telling it to go to each file it found paths in, and extract +/- 3 lines (6 lines each hit). Boom, instant smartness upgrade. Totally tackled the task and actually used reasoning as was very apparent in a 'better' response, and the expected level of output quality.

Perhaps something in tooling is negatively impacting the natural ability of the models to consume and consider surrounding context? Just giving it a few lines did wonders for me, and upon reflection, I think I've been fighting this more than I previously thought globally in my work since day #1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions