chore: strengthen validation guidance in system prompt by NTaylorMullen · Pull Request #18544 · google-gemini/gemini-cli

NTaylorMullen · 2026-02-07T21:56:11Z

Summary

Standardize validation fidelity by updating the system prompt to mandate thorough, project-wide validation (e.g., builds and type-checking) beyond just unit tests. This ensures that agents autonomously verify structural integrity when performing changes.

Details

Modified packages/core/src/prompts/snippets.ts to strengthen validation guidance in renderCoreMandates and renderPrimaryWorkflows.
Added a new behavioral evaluation test evals/validation_fidelity.eval.ts (policy: ALWAYS_PASSES) that simulates a refactoring task and verifies the agent autonomously runs build/type-check commands.
Updated evals/test-helper.ts to support custom timeouts for individual evaluation cases and to prevent symlink collisions during repeated runs.
Updated packages/core/src/core/__snapshots__/prompts.test.ts.snap to reflect the system prompt changes.

Related Issues

Fixes #18529

How to Validate

Run the behavioral evaluation test multiple times to ensure stability:
cross-env RUN_EVALS=1 npx vitest run evals/validation_fidelity.eval.ts --config evals/vitest.config.ts
Verify that the tool logs show the agent running both npm test and npm run build (or tsc) autonomously.
Run existing prompt tests to ensure snapshots are correct:
npm test -w @google/gemini-cli-core -- src/core/prompts.test.ts

Pre-Merge Checklist

gemini-code-assist · 2026-02-07T21:56:29Z

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the validation capabilities of the system's AI agents. By updating the core system prompt, agents are now explicitly instructed to perform exhaustive, project-wide validation, including build and type-checking, ensuring structural integrity alongside behavioral correctness. This change is supported by new evaluation tests designed to confirm the agents' autonomous adherence to these strengthened validation mandates, ultimately leading to more robust and reliable code modifications.

Highlights

Enhanced Validation Guidance: The system prompt has been updated to mandate comprehensive, project-wide validation (e.g., builds, type-checking) for agents, moving beyond just unit tests to ensure structural integrity.
New Behavioral Evaluation Test: A new test (validation_fidelity.eval.ts) has been introduced to verify that agents autonomously execute build/type-check commands after performing refactoring tasks.
Test Helper Improvements: The test-helper.ts file now supports custom timeouts for individual evaluation cases and includes logic to prevent symlink collisions during test setup.
Dependency Resolution Updates: The package-lock.json file has been updated, primarily removing peer: true flags from several dependency entries, indicating changes in dependency resolution.

Changelog

evals/test-helper.ts
- Added a condition to prevent symlinking node_modules if the target directory already contains one, avoiding potential collisions.
- Modified it calls to accept an optional evalCase.timeout parameter, allowing custom timeouts for individual evaluation tests.
evals/validation_fidelity.eval.ts
- Introduced a new evaluation test file.
- This test defines a refactoring scenario and asserts that the agent autonomously runs build or type-check commands (npm run build, tsc, typecheck, npm run verify) to ensure project-wide structural integrity.
package-lock.json
- Removed the "peer": true property from several dependency entries, including @octokit/graphql, @opentelemetry/* packages, csstype, @typescript-eslint/utils, acorn, eslint, express, @jrichman/ink, react, rxjs, picomatch, tslib, tsx, typescript, vite, vitest, and zod. This change likely reflects updated dependency resolution or package metadata.
packages/core/src/core/snapshots/prompts.test.ts.snap
- Updated the snapshot tests to reflect the changes made to the system prompt's validation guidance.
packages/core/src/prompts/snippets.ts
- Modified the renderCoreMandates and renderPrimaryWorkflows functions to update the "Technical Integrity" and "Validation" sections of the system prompt.
- The updated guidance emphasizes that validation is an "exhaustive process" ensuring "behavioral, structural, and stylistic" correctness and "full compatibility with the broader project," explicitly stating that "partial or isolated checks are insufficient."

Activity

No specific human activity (comments, reviews, progress updates) is recorded in the provided context. The pull request description outlines the changes and validation steps.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request strengthens the validation guidance in the system prompt to encourage more thorough, project-wide validation by the agent. This is supported by a new behavioral evaluation test. The changes to the test helper to support custom timeouts and prevent symlink collisions are also good improvements.

I've found one high-severity issue in the new evaluation test where toolRequest.args is treated as a plain string instead of a JSON string, which violates repository rules and makes the test brittle. My review includes a suggestion to parse the JSON before accessing its properties, which will make the test more robust.

evals/validation_fidelity.eval.ts

github-actions · 2026-02-07T22:00:20Z

Size Change: -779 B (0%)

Total Size: 23.9 MB

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/gemini.js`	23.8 MB	-779 B (0%)
`./bundle/sandbox-macos-permissive-closed.sb`	1.03 kB	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-closed.sb`	3.29 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B

_{compressed-size-action}

abhipatel12

LGTM!

evals/validation_fidelity.eval.ts

packages/core/src/prompts/snippets.ts

- Update system prompt snippets to mandate exhaustive validation (structural and project-wide) - Add 'ALWAYS_PASSES' behavioral evaluation test to verify agent validation thoroughness - Update snapshots to reflect prompt changes - Improve eval test helper to support custom timeouts and prevent symlink collisions Fixes #18529

- Fix tool arg parsing in validation_fidelity.eval.ts - Add validation_fidelity_pre_existing_errors.eval.ts to test agent resilience to project errors Part of #18529

…#18544)

NTaylorMullen requested review from a team as code owners February 7, 2026 21:56

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

evals/validation_fidelity.eval.ts Outdated Show resolved Hide resolved

gemini-cli bot added area/platform Issues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. labels Feb 7, 2026

abhipatel12 approved these changes Feb 7, 2026

View reviewed changes

evals/validation_fidelity.eval.ts Outdated Show resolved Hide resolved

packages/core/src/prompts/snippets.ts Show resolved Hide resolved

NTaylorMullen force-pushed the ntm/gh.18529 branch from 537583c to e5bcea6 Compare February 9, 2026 02:04

NTaylorMullen enabled auto-merge February 9, 2026 02:05