Skip to content

Comments

feat(plan): add positive test case and update eval stability policy #18457

Merged
jerop merged 1 commit intomainfrom
fix/plan-mode-evals-followup
Feb 6, 2026
Merged

feat(plan): add positive test case and update eval stability policy #18457
jerop merged 1 commit intomainfrom
fix/plan-mode-evals-followup

Conversation

@jerop
Copy link
Contributor

@jerop jerop commented Feb 6, 2026

Adds a positive test case for modifying plan files in Plan Mode and updates all tests to USUALLY_PASSES

Follow up to #18437

Closes #17169

@jerop jerop requested a review from a team as a code owner February 6, 2026 19:12
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jerop, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new test case to validate the expected behavior of file modifications within the plans directory when the system is operating in Plan Mode. Concurrently, it refines the stability policy for existing evaluation tests by reclassifying them as USUALLY_PASSES and standardizes the use of the ApprovalMode enum for consistency and type safety.

Highlights

  • New Test Case for Plan Mode File Modification: A new positive test case has been added to verify that file modifications are allowed within the plans directory when the system is in Plan Mode. This ensures that users can edit their plans as expected.
  • Test Stability Policy Update: All existing evalTest calls have been updated from ALWAYS_PASSES to USUALLY_PASSES, reflecting a more flexible and realistic test stability policy for these evaluation tests.
  • Enum Usage for Approval Mode: String literals previously used for approvalMode in evalTest calls have been replaced with the ApprovalMode enum from @google/gemini-cli-core, enhancing type safety and code readability.
Changelog
  • evals/plan_mode.eval.ts
    • Imported the ApprovalMode enum from @google/gemini-cli-core.
    • Updated the stability of several evalTest calls from ALWAYS_PASSES to USUALLY_PASSES.
    • Replaced hardcoded string values for approvalMode with ApprovalMode.PLAN or ApprovalMode.DEFAULT enum members.
    • Added a new evalTest case named 'should allow file modification in plans directory when in plan mode' to confirm that file modifications are permitted within the plans directory during plan mode.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a positive test case for file modifications in plan mode and updates existing tests to USUALLY_PASSES. The changes to use the ApprovalMode enum are a good improvement. However, I've found a critical issue in the new test case. The test is logically flawed as it uses a file path that doesn't conform to the security policy it's supposed to be testing, making the test's validation incorrect. Please see my detailed comment.

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Size Change: -2 B (0%)

Total Size: 23.8 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 23.8 MB -2 B (0%)
./bundle/sandbox-macos-permissive-closed.sb 1.03 kB 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-closed.sb 3.29 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B

compressed-size-action

@gemini-cli gemini-cli bot added area/core Issues related to User Interface, OS Support, Core Functionality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. labels Feb 6, 2026
Adds a positive test case for modifying plan files in Plan Mode and updates all tests to USUALLY_PASSES as per reviewer feedback.
@jerop jerop force-pushed the fix/plan-mode-evals-followup branch from 663077a to cff7c86 Compare February 6, 2026 19:33
@jerop jerop added this pull request to the merge queue Feb 6, 2026
Merged via the queue into main with commit 601f060 Feb 6, 2026
42 checks passed
@jerop jerop deleted the fix/plan-mode-evals-followup branch February 6, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add behavioral evaluations for planning tools and workflow

2 participants