Skip to content

Comments

feat(core): overhaul system prompt for rigor, integrity, and intent alignment#17263

Merged
NTaylorMullen merged 1 commit intomainfrom
ntm/sys.prompt.overhaul
Feb 7, 2026
Merged

feat(core): overhaul system prompt for rigor, integrity, and intent alignment#17263
NTaylorMullen merged 1 commit intomainfrom
ntm/sys.prompt.overhaul

Conversation

@NTaylorMullen
Copy link
Collaborator

@NTaylorMullen NTaylorMullen commented Jan 22, 2026

Summary

This PR overhauls the system prompt for Gemini CLI to improve engineering rigor, technical integrity, and alignment with user intent. It introduces a structured Research -> Strategy -> Execution lifecycle while maintaining legacy compatibility for non-preview models.

Details

  • Refactored Lifecycle: Moves from a flat instruction set to a structured Research -> Strategy -> Execution workflow, allowing for better discovery and planning phases.
  • Surgical Implementation: Mandates an iterative Plan -> Act -> Validate cycle for the execution phase, prioritizing targeted code modifications, automated tests, and ecosystem tool usage (e.g., eslint --fix).
  • Intent Alignment: Implements a clear distinction between Inquiries (analysis/advice) and Directives (action) to prevent goal-creep and unintended modifications during research phases.
  • Modernized Technology Defaults: Updates "New Application" guidance to favor Vanilla CSS and modern, platform-appropriate tech stacks (React/TypeScript, FastAPI, Three.js) while emphasizing a polished, "alive" user experience.
  • Rigorous Validation: Establishes comprehensive verification (builds, tests, lints) as the mandatory path to finality, ensuring no regressions or structural side-effects.
  • Legacy Compatibility: Preserves existing prompt behavior by migrating current main logic to snippets.legacy.ts. The PromptProvider now dynamically selects between overhauled and legacy snippets based on whether the active model is a "preview model" (Gemini 3 family).
  • Test Integrity: Updated snapshots and unit tests in prompts.test.ts to validate the new prompt structure and ensure parity for legacy models.

Related Issues

Part of the ongoing effort to improve agent reliability and engineering quality.

How to Validate

  1. Run core prompt tests: npm test -w @google/gemini-cli-core -- src/core/prompts.test.ts
  2. Run compression service tests: npm test -w @google/gemini-cli-core -- src/services/chatCompressionService.test.ts
  3. Perform full preflight validation: npm run preflight
  4. Manually inspect prompt output via the CLI using a preview model (e.g., Gemini 3) and a legacy model (e.g., Gemini 2.5) to verify dynamic selection.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on MacOS
    • npm run
    • npm run preflight

@NTaylorMullen NTaylorMullen requested review from a team as code owners January 22, 2026 01:37
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a major refactoring and optimization of the system prompt, which guides the agent's behavior and decision-making. The changes aim to enhance the agent's understanding of its role, improve its communication style, enforce stricter engineering standards, and refine its workflow for various tasks. By clarifying mandates, improving tool usage instructions, and adding new evaluation tests, the PR ensures the agent operates more predictably, efficiently, and in better alignment with user intent, particularly in complex software engineering scenarios.

Highlights

  • Comprehensive System Prompt Overhaul: The core system prompt has undergone a significant overhaul, introducing new sections for 'Communication Style', 'Security Protocols', 'Engineering Standards', and refined 'Workflow' definitions for development and new application creation. This aims to improve agent intent alignment, idiomatic completeness, and overall rigor.
  • Tool Renaming and Enhancement: The search_file_content tool has been renamed to grep_search across the codebase and documentation. Additionally, the grep_search tool now limits its output to a maximum of 100 matches by default to improve performance and token efficiency.
  • New Evaluation Tests for Agent Behavior: New evaluation tests (analysis-mode.eval.ts and delegation_strategy.eval.ts) have been added. These tests specifically validate that the agent does not automatically modify files when merely 'inspecting' for bugs, but acts when explicitly asked to 'fix' them. They also ensure correct delegation to specialized agents like codebase_investigator for architectural tasks.
  • Refined Agent Delegation Heuristics: The description for the CodebaseInvestigatorAgent has been updated to clarify its role in architectural analysis and dependency identification. The system prompt now provides clearer guidance on when to delegate to sub-agents versus using manual search tools.
  • Environment Context Refactoring: The way environment context (like workspace directories and folder structure) is passed to the agent has been refactored. It is now encapsulated within <session_context> tags in the initial user message, and dynamic environment details (date, platform, temp directory) are passed via a new PromptEnv object.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive overhaul of the system prompt to improve agent behavior, along with several related refactorings and improvements. Key changes include a much more detailed and structured system prompt, renaming the search_file_content tool to grep_search, and refactoring how environment context is provided to the agent. New evaluation tests have been added to validate the new agent behaviors.

My review focuses on performance and correctness. I've identified a performance issue in how the match limit is applied in both the grep and ripgrep tools. The current implementation fetches all results and then truncates them in JavaScript, which can be inefficient. I've suggested using the native --max-count flags available in these tools to limit the results at the source.

Overall, this is a significant and well-structured update that should improve the agent's capabilities. The refactoring work makes the codebase cleaner and more maintainable.

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Jan 22, 2026
@SandyTao520 SandyTao520 requested a review from a team as a code owner January 22, 2026 19:55
@SandyTao520 SandyTao520 force-pushed the ntm/sys.prompt.overhaul branch from ece0d9e to d385ff8 Compare January 22, 2026 19:55
@github-actions
Copy link

github-actions bot commented Jan 22, 2026

Size Change: +39.8 kB (+0.17%)

Total Size: 23.9 MB

Filename Size Change
./bundle/gemini.js 23.8 MB +39.8 kB (+0.17%)
ℹ️ View Unchanged
Filename Size
./bundle/sandbox-macos-permissive-closed.sb 1.03 kB
./bundle/sandbox-macos-permissive-open.sb 890 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB
./bundle/sandbox-macos-restrictive-closed.sb 3.29 kB
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB

compressed-size-action

@SandyTao520 SandyTao520 force-pushed the ntm/sys.prompt.overhaul branch 4 times, most recently from 09d4488 to 814803b Compare January 27, 2026 19:43
@gundermanc
Copy link
Member

I ran the current set of behavioral evals against this branch for all models: https://github.com/google-gemini/gemini-cli/actions/runs/21453547514

It looks like some of the existing ones might not be passing 3/3 times anymore with these changes. Any of these regressions?

Also the new tests don't seem to pass 3/3 times for Gemini 3.0 at least.

image

NTaylorMullen added a commit that referenced this pull request Feb 3, 2026
…l updates

- Refine 'Expertise & Intent Alignment' to default to Inquiry and require explicit Directives for action.
- Update 'Technical Integrity' and 'Execution' workflows to prioritize clean abstractions within the target scope while avoiding unrelated refactoring.
- Clarify 'Proactiveness' to apply strictly when executing a Directive.
- Add 'stop and wait' instruction for resolved inquiries or pending directives to stabilize workflows.
- Update and verify prompt snapshots.

Part of #17263
NTaylorMullen added a commit that referenced this pull request Feb 5, 2026
- Create `snippets.legacy.ts` as a pure replica of the original system prompt logic.
- Introduce `snippets.ts` with the modern Gemini 3 prompt overhaul.
- Update `PromptProvider.ts` to select between legacy and overhauled snippets based on the active model.
- Make history compression model-aware by passing `config` to `getCompressionPrompt`.
- Update unit tests and snapshots to verify correct prompt gating for preview and non-preview models.

Part of #17263
NTaylorMullen added a commit that referenced this pull request Feb 5, 2026
…l updates

- Refine 'Expertise & Intent Alignment' to default to Inquiry and require explicit Directives for action.
- Update 'Technical Integrity' and 'Execution' workflows to prioritize clean abstractions within the target scope while avoiding unrelated refactoring.
- Clarify 'Proactiveness' to apply strictly when executing a Directive.
- Add 'stop and wait' instruction for resolved inquiries or pending directives to stabilize workflows.
- Update and verify prompt snapshots.

Part of #17263
NTaylorMullen added a commit that referenced this pull request Feb 5, 2026
- Create `snippets.legacy.ts` as a pure replica of the original system prompt logic.
- Introduce `snippets.ts` with the modern Gemini 3 prompt overhaul.
- Update `PromptProvider.ts` to select between legacy and overhauled snippets based on the active model.
- Make history compression model-aware by passing `config` to `getCompressionPrompt`.
- Update unit tests and snapshots to verify correct prompt gating for preview and non-preview models.

Part of #17263
@NTaylorMullen NTaylorMullen force-pushed the ntm/sys.prompt.overhaul branch 3 times, most recently from e223071 to c15b1cc Compare February 5, 2026 07:05
@NTaylorMullen NTaylorMullen force-pushed the ntm/sys.prompt.overhaul branch 4 times, most recently from c96200d to e2de9b0 Compare February 7, 2026 02:40
…lignment

- Refactored system prompt structure into a Research/Strategy/Execution lifecycle

- Modernized technology recommendations (favoring Vanilla CSS and modern stacks)

- Integrated structured planning workflow and ecosystem tool checks

- Preserved legacy prompt behavior by migrating current main logic to snippets.legacy.ts

- Updated tests and snapshots for exhaustive validation
@NTaylorMullen NTaylorMullen force-pushed the ntm/sys.prompt.overhaul branch from e2de9b0 to 8391fb8 Compare February 7, 2026 02:59
@NTaylorMullen NTaylorMullen changed the title ntm/sys.prompt.overhaul feat(core): overhaul system prompt for rigor, integrity, and intent alignment Feb 7, 2026
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 7, 2026
Merged via the queue into main with commit 9178b31 Feb 7, 2026
27 checks passed
@NTaylorMullen NTaylorMullen deleted the ntm/sys.prompt.overhaul branch February 7, 2026 03:25
jerop added a commit that referenced this pull request Feb 7, 2026
This is a follow up to #17263, ensuring consistency between snippets.ts and snippets.legacy.ts by removing the redundant planning section from renderFinalShell.
aswinashok44 pushed a commit to aswinashok44/gemini-cli that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants