Skip to content

Comments

chore: strengthen validation guidance in system prompt#18544

Merged
NTaylorMullen merged 2 commits intomainfrom
ntm/gh.18529
Feb 9, 2026
Merged

chore: strengthen validation guidance in system prompt#18544
NTaylorMullen merged 2 commits intomainfrom
ntm/gh.18529

Conversation

@NTaylorMullen
Copy link
Collaborator

Summary

Standardize validation fidelity by updating the system prompt to mandate thorough, project-wide validation (e.g., builds and type-checking) beyond just unit tests. This ensures that agents autonomously verify structural integrity when performing changes.

Details

  • Modified packages/core/src/prompts/snippets.ts to strengthen validation guidance in renderCoreMandates and renderPrimaryWorkflows.
  • Added a new behavioral evaluation test evals/validation_fidelity.eval.ts (policy: ALWAYS_PASSES) that simulates a refactoring task and verifies the agent autonomously runs build/type-check commands.
  • Updated evals/test-helper.ts to support custom timeouts for individual evaluation cases and to prevent symlink collisions during repeated runs.
  • Updated packages/core/src/core/__snapshots__/prompts.test.ts.snap to reflect the system prompt changes.

Related Issues

Fixes #18529

How to Validate

  1. Run the behavioral evaluation test multiple times to ensure stability:
    cross-env RUN_EVALS=1 npx vitest run evals/validation_fidelity.eval.ts --config evals/vitest.config.ts
  2. Verify that the tool logs show the agent running both npm test and npm run build (or tsc) autonomously.
  3. Run existing prompt tests to ensure snapshots are correct:
    npm test -w @google/gemini-cli-core -- src/core/prompts.test.ts

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@NTaylorMullen NTaylorMullen requested review from a team as code owners February 7, 2026 21:56
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the validation capabilities of the system's AI agents. By updating the core system prompt, agents are now explicitly instructed to perform exhaustive, project-wide validation, including build and type-checking, ensuring structural integrity alongside behavioral correctness. This change is supported by new evaluation tests designed to confirm the agents' autonomous adherence to these strengthened validation mandates, ultimately leading to more robust and reliable code modifications.

Highlights

  • Enhanced Validation Guidance: The system prompt has been updated to mandate comprehensive, project-wide validation (e.g., builds, type-checking) for agents, moving beyond just unit tests to ensure structural integrity.
  • New Behavioral Evaluation Test: A new test (validation_fidelity.eval.ts) has been introduced to verify that agents autonomously execute build/type-check commands after performing refactoring tasks.
  • Test Helper Improvements: The test-helper.ts file now supports custom timeouts for individual evaluation cases and includes logic to prevent symlink collisions during test setup.
  • Dependency Resolution Updates: The package-lock.json file has been updated, primarily removing peer: true flags from several dependency entries, indicating changes in dependency resolution.
Changelog
  • evals/test-helper.ts
    • Added a condition to prevent symlinking node_modules if the target directory already contains one, avoiding potential collisions.
    • Modified it calls to accept an optional evalCase.timeout parameter, allowing custom timeouts for individual evaluation tests.
  • evals/validation_fidelity.eval.ts
    • Introduced a new evaluation test file.
    • This test defines a refactoring scenario and asserts that the agent autonomously runs build or type-check commands (npm run build, tsc, typecheck, npm run verify) to ensure project-wide structural integrity.
  • package-lock.json
    • Removed the "peer": true property from several dependency entries, including @octokit/graphql, @opentelemetry/* packages, csstype, @typescript-eslint/utils, acorn, eslint, express, @jrichman/ink, react, rxjs, picomatch, tslib, tsx, typescript, vite, vitest, and zod. This change likely reflects updated dependency resolution or package metadata.
  • packages/core/src/core/snapshots/prompts.test.ts.snap
    • Updated the snapshot tests to reflect the changes made to the system prompt's validation guidance.
  • packages/core/src/prompts/snippets.ts
    • Modified the renderCoreMandates and renderPrimaryWorkflows functions to update the "Technical Integrity" and "Validation" sections of the system prompt.
    • The updated guidance emphasizes that validation is an "exhaustive process" ensuring "behavioral, structural, and stylistic" correctness and "full compatibility with the broader project," explicitly stating that "partial or isolated checks are insufficient."
Activity
  • No specific human activity (comments, reviews, progress updates) is recorded in the provided context. The pull request description outlines the changes and validation steps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request strengthens the validation guidance in the system prompt to encourage more thorough, project-wide validation by the agent. This is supported by a new behavioral evaluation test. The changes to the test helper to support custom timeouts and prevent symlink collisions are also good improvements.

I've found one high-severity issue in the new evaluation test where toolRequest.args is treated as a plain string instead of a JSON string, which violates repository rules and makes the test brittle. My review includes a suggestion to parse the JSON before accessing its properties, which will make the test more robust.

@github-actions
Copy link

github-actions bot commented Feb 7, 2026

Size Change: -779 B (0%)

Total Size: 23.9 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 23.8 MB -779 B (0%)
./bundle/sandbox-macos-permissive-closed.sb 1.03 kB 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-closed.sb 3.29 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B

compressed-size-action

@gemini-cli gemini-cli bot added area/platform Issues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. labels Feb 7, 2026
Copy link
Collaborator

@abhipatel12 abhipatel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
- Update system prompt snippets to mandate exhaustive validation (structural and project-wide)
- Add 'ALWAYS_PASSES' behavioral evaluation test to verify agent validation thoroughness
- Update snapshots to reflect prompt changes
- Improve eval test helper to support custom timeouts and prevent symlink collisions

Fixes #18529
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
- Fix tool arg parsing in validation_fidelity.eval.ts

- Add validation_fidelity_pre_existing_errors.eval.ts to test agent resilience to project errors

Part of #18529
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 9, 2026
Merged via the queue into main with commit d45a45d Feb 9, 2026
26 checks passed
@NTaylorMullen NTaylorMullen deleted the ntm/gh.18529 branch February 9, 2026 05:44
aswinashok44 pushed a commit to aswinashok44/gemini-cli that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/platform Issues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize Validation Fidelity: Bridge the gap between Testing and Type-checking

2 participants