chore: strengthen validation guidance in system prompt#18544
chore: strengthen validation guidance in system prompt#18544NTaylorMullen merged 2 commits intomainfrom
Conversation
Summary of ChangesHello @NTaylorMullen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the validation capabilities of the system's AI agents. By updating the core system prompt, agents are now explicitly instructed to perform exhaustive, project-wide validation, including build and type-checking, ensuring structural integrity alongside behavioral correctness. This change is supported by new evaluation tests designed to confirm the agents' autonomous adherence to these strengthened validation mandates, ultimately leading to more robust and reliable code modifications. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request strengthens the validation guidance in the system prompt to encourage more thorough, project-wide validation by the agent. This is supported by a new behavioral evaluation test. The changes to the test helper to support custom timeouts and prevent symlink collisions are also good improvements.
I've found one high-severity issue in the new evaluation test where toolRequest.args is treated as a plain string instead of a JSON string, which violates repository rules and makes the test brittle. My review includes a suggestion to parse the JSON before accessing its properties, which will make the test more robust.
|
Size Change: -779 B (0%) Total Size: 23.9 MB ℹ️ View Unchanged
|
537583c to
e5bcea6
Compare
- Update system prompt snippets to mandate exhaustive validation (structural and project-wide) - Add 'ALWAYS_PASSES' behavioral evaluation test to verify agent validation thoroughness - Update snapshots to reflect prompt changes - Improve eval test helper to support custom timeouts and prevent symlink collisions Fixes #18529
e5bcea6 to
0e890c5
Compare
0e890c5 to
a28a71e
Compare
- Fix tool arg parsing in validation_fidelity.eval.ts - Add validation_fidelity_pre_existing_errors.eval.ts to test agent resilience to project errors Part of #18529
a28a71e to
75ef06e
Compare
Summary
Standardize validation fidelity by updating the system prompt to mandate thorough, project-wide validation (e.g., builds and type-checking) beyond just unit tests. This ensures that agents autonomously verify structural integrity when performing changes.
Details
packages/core/src/prompts/snippets.tsto strengthen validation guidance inrenderCoreMandatesandrenderPrimaryWorkflows.evals/validation_fidelity.eval.ts(policy:ALWAYS_PASSES) that simulates a refactoring task and verifies the agent autonomously runs build/type-check commands.evals/test-helper.tsto support custom timeouts for individual evaluation cases and to prevent symlink collisions during repeated runs.packages/core/src/core/__snapshots__/prompts.test.ts.snapto reflect the system prompt changes.Related Issues
Fixes #18529
How to Validate
cross-env RUN_EVALS=1 npx vitest run evals/validation_fidelity.eval.ts --config evals/vitest.config.tsnpm testandnpm run build(ortsc) autonomously.npm test -w @google/gemini-cli-core -- src/core/prompts.test.tsPre-Merge Checklist