fix(evals): prevent false positive in hierarchical memory test#18777
fix(evals): prevent false positive in hierarchical memory test#18777Abhijit-2592 merged 2 commits intomainfrom
Conversation
Isolates stdout from stderr in the test assertion to prevent 'Apple' matches in system logs (e.g., 'apple-darwin' paths in ripgrep cache messages) from failing the fruit preference check.
|
Hi @Abhijit-2592, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this. We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines. Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed. Thank you for your understanding and for being a part of our community! |
Summary of ChangesHello @Abhijit-2592, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue in the hierarchical memory test where system-generated output could interfere with test assertions, leading to intermittent failures. By explicitly separating standard output from standard error, the test now accurately evaluates the model's response without being affected by irrelevant log messages, improving test reliability. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly isolates stdout from stderr in the conflictResolutionTest to prevent false positives in test assertions. This is a good change to improve test stability. However, the extensionVsGlobalTest in the same file also contains a negative assertion and is vulnerable to the same issue. To ensure all tests are robust, I recommend applying this same logic to extensionVsGlobalTest to prevent future flakiness.
|
Size Change: -2 B (0%) Total Size: 24.3 MB ℹ️ View Unchanged
|
* Fix newline insertion bug in replace tool (google-gemini#18595) * fix(evals): update save_memory evals and simplify tool description (google-gemini#18610) * chore(evals): update validation_fidelity_pre_existing_errors to USUALLY_PASSES (google-gemini#18617) * fix: shorten tool call IDs and fix duplicate tool name in truncated output filenames (google-gemini#18600) * feat(cli): implement atomic writes and safety checks for trusted folders (google-gemini#18406) * Remove relative docs links (google-gemini#18650) * docs: add legacy snippets convention to GEMINI.md (google-gemini#18597) * fix(chore): Support linting for cjs (google-gemini#18639) Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com> * feat: move shell efficiency guidelines to tool description (google-gemini#18614) * Added "" as default value, since getText() used to expect a string only and thus crashed when undefined... Fixes google-gemini#18076 (google-gemini#18099) * Allow @-includes outside of workspaces (with permission) (google-gemini#18470) * chore: make `ask_user` header description more clear (google-gemini#18657) * bug(core): Fix minor bug in migration logic. (google-gemini#18661) * Harded code assist converter. (google-gemini#18656) * refactor(core): model-dependent tool definitions (google-gemini#18563) * feat: enable plan mode experiment in settings (google-gemini#18636) * refactor: push isValidPath() into parsePastedPaths() (google-gemini#18664) * fix(cli): correct 'esc to cancel' position and restore duration display (google-gemini#18534) * feat(cli): add DevTools integration with gemini-cli-devtools (google-gemini#18648) * chore: remove unused exports and redundant hook files (google-gemini#18681) * Fix number of lines being reported in rewind confirmation dialog (google-gemini#18675) * feat(cli): disable folder trust in headless mode (google-gemini#18407) * Disallow unsafe type assertions (google-gemini#18688) * Change event type for release (google-gemini#18693) * feat: handle multiple dynamic context filenames in system prompt (google-gemini#18598) * Properly parse at-commands with narrow non-breaking spaces (google-gemini#18677) * refactor(core): centralize core tool definitions and support model-specific schemas (google-gemini#18662) * feat(core): Render memory hierarchically in context. (google-gemini#18350) * feat: Ctrl+O to expand paste placeholder (google-gemini#18103) * fix(cli): Improve header spacing (google-gemini#18531) * Feature/quota visibility 16795 (google-gemini#18203) * docs: remove TOC marker from Plan Mode header (google-gemini#18678) * Inline thinking bubbles with summary/full modes (google-gemini#18033) Co-authored-by: Jacob Richman <jacob314@gmail.com> * fix(ui): remove redundant newlines in Gemini messages (google-gemini#18538) * test(cli): fix AppContainer act() warnings and improve waitFor resilience (google-gemini#18676) * refactor(core): refine Security & System Integrity section in system prompt (google-gemini#18601) * Fix layout rounding. (google-gemini#18667) * docs(skills): enhance pr-creator safety and interactivity (google-gemini#18616) * test(core): remove hardcoded model from TestRig (google-gemini#18710) * feat(core): optimize sub-agents system prompt intro (google-gemini#18608) * feat(cli): update approval mode labels and shortcuts per latest UX spec (google-gemini#18698) * fix(plan): update persistent approval mode setting (google-gemini#18638) Co-authored-by: Sandy Tao <sandytao520@icloud.com> * fix: move toasts location to left side (google-gemini#18705) * feat(routing): restrict numerical routing to Gemini 3 family (google-gemini#18478) * fix(ide): fix ide nudge setting (google-gemini#18733) * fix(core): standardize tool formatting in system prompts (google-gemini#18615) * chore: consolidate to green in ask user dialog (google-gemini#18734) * feat: add `extensionsExplore` setting to enable extensions explore UI. (google-gemini#18686) * feat(cli): defer devtools startup and integrate with F12 (google-gemini#18695) * ui: update & subdue footer colors and animate progress indicator (google-gemini#18570) * test: add model-specific snapshots for coreTools (google-gemini#18707) Co-authored-by: matt korwel <matt.korwel@gmail.com> * ci: shard windows tests and fix event listener leaks (google-gemini#18670) * fix: allow `ask_user` tool in yolo mode (google-gemini#18541) * feat: redact disabled tools from system prompt (google-gemini#13597) (google-gemini#18613) * Update Gemini.md to use the curent year on creating new files (google-gemini#18460) * Code review cleanup for thinking display (google-gemini#18720) * fix(cli): hide scrollbars when in alternate buffer copy mode (google-gemini#18354) Co-authored-by: Jacob Richman <jacob314@gmail.com> * Fix issues with rip grep (google-gemini#18756) * fix(cli): fix history navigation regression after prompt autocomplete (google-gemini#18752) * chore: cleanup unused and add unlisted dependencies in packages/cli (google-gemini#18749) * Fix issue where Gemini CLI creates tests in a new file (google-gemini#18409) * feat(telemetry): Ensure experiment IDs are included in OpenTelemetry logs (google-gemini#18747) * feat(ux): added text wrapping capabilities to markdown tables (google-gemini#18240) Co-authored-by: jacob314 <jacob314@gmail.com> * Revert "fix(mcp): ensure MCP transport is closed to prevent memory leaks" (google-gemini#18771) * chore(release): bump version to 0.30.0-nightly.20260210.a2174751d (google-gemini#18772) * chore: cleanup unused and add unlisted dependencies in packages/core (google-gemini#18762) * chore(core): update activate_skill prompt verbiage to be more direct (google-gemini#18605) * Add autoconfigure memory usage setting to the dialog (google-gemini#18510) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(core): prevent race condition in policy persistence (google-gemini#18506) Co-authored-by: Allen Hutchison <adh@google.com> * fix(evals): prevent false positive in hierarchical memory test (google-gemini#18777) * test(evals): mark all `save_memory` evals as `USUALLY_PASSES` due to unreliability (google-gemini#18786) * feat(cli): add setting to hide shortcuts hint UI (google-gemini#18562) * feat(core): formalize 5-phase sequential planning workflow (google-gemini#18759) * Introduce limits for search results. (google-gemini#18767) --------- Co-authored-by: Andrew Garrett <andrewgarrett@google.com> Co-authored-by: N. Taylor Mullen <ntaylormullen@google.com> Co-authored-by: Sandy Tao <sandytao520@icloud.com> Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com> Co-authored-by: christine betts <chrstn@uw.edu> Co-authored-by: Aswin Ashok <aswwwin@google.com> Co-authored-by: Abhijith V Ashok <abhi2349jith@gmail.com> Co-authored-by: Tommaso Sciortino <sciortino@gmail.com> Co-authored-by: Jack Wotherspoon <jackwoth@google.com> Co-authored-by: joshualitt <joshualitt@google.com> Co-authored-by: Jacob Richman <jacob314@gmail.com> Co-authored-by: Aishanee Shah <aishaneeshah@gmail.com> Co-authored-by: Jerop Kipruto <jerop@google.com> Co-authored-by: Adib234 <30782825+Adib234@users.noreply.github.com> Co-authored-by: Christian Gunderman <gundermanc@gmail.com> Co-authored-by: g-samroberts <158088236+g-samroberts@users.noreply.github.com> Co-authored-by: Spencer <spencertang@google.com> Co-authored-by: Dmitry Lyalin <dmitry.lyalin@lyalin.com> Co-authored-by: matt korwel <matt.korwel@gmail.com> Co-authored-by: Shreya Keshive <shreyakeshive@google.com> Co-authored-by: Sri Pasumarthi <111310667+sripasg@users.noreply.github.com> Co-authored-by: Keith Guerin <keithguerin@gmail.com> Co-authored-by: Sehoon Shon <sshon@google.com> Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com> Co-authored-by: Kevin Ramdass <ramdass.kevin@gmail.com> Co-authored-by: Dev Randalpura <devrandalpura@google.com> Co-authored-by: gemini-cli-robot <gemini-cli-robot@google.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Brad Dux <959674+braddux@users.noreply.github.com> Co-authored-by: Allen Hutchison <adh@google.com> Co-authored-by: Abhijit Balaji <abhijitbalaji@google.com>
Isolates stdout from stderr in the test assertion to prevent 'Apple' matches in system logs (e.g., 'apple-darwin' paths in ripgrep cache messages) from failing the fruit preference check.
Summary
Details
Related Issues
Fixes #18779
How to Validate
Pre-Merge Checklist