Skip to content

repro: windows hook flakiness#18714

Closed
NTaylorMullen wants to merge 58 commits intomainfrom
ntm/repro-hook-flakiness
Closed

repro: windows hook flakiness#18714
NTaylorMullen wants to merge 58 commits intomainfrom
ntm/repro-hook-flakiness

Conversation

@NTaylorMullen
Copy link
Collaborator

Summary

Reproduction of Windows hook test flakiness.

Details

This PR modifies the CI workflow to run only hooks-system.test.ts on Windows to isolate failures.

Related Issues

Related to #18665

How to Validate

Watch the CI run.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
    • Windows
    • Linux

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to address flakiness observed in Windows hook tests by refactoring how external script commands are executed within the test suite. By moving inline node -e commands into dedicated script files and introducing path normalization, the changes enhance the robustness and cross-platform compatibility of the tests. Additionally, increased timeouts for Windows CI and explicit PATH handling for pty processes further contribute to a more stable testing environment.

Highlights

  • Cross-Platform Compatibility for Hook Commands: Refactored inline node -e commands in integration tests into separate script files to improve cross-platform compatibility, particularly on Windows where quoting and escaping can be problematic.
  • Path Normalization Utility: Introduced a normalizePath utility function to ensure consistent path formatting across different operating systems when executing commands, converting Windows backslashes to forward slashes.
  • Sequential Hook Execution: Added sequential: true to numerous hook configurations in tests, likely to ensure predictable execution order and reduce flakiness in complex hook scenarios.
  • Increased Windows CI Test Timeouts: Increased default test timeouts specifically for Windows CI environments to 10 minutes (from 5 minutes) to accommodate potential performance differences and mitigate flakiness.
  • Script Creation Helper: Implemented a createScript helper method in TestRig to streamline the creation and management of temporary script files for tests, improving test readability and maintainability.
  • Pty PATH Environment Variable: Ensured the PATH environment variable is explicitly included for pty processes to prevent command execution issues in certain test environments.
Changelog
  • integration-tests/hooks-agent-flow.test.ts
    • Updated fs.readFileSync calls to use JSON.stringify for messageCountFile path.
    • Replaced inline node -e command with a dynamically created script file for AfterAgent hook.
    • Replaced inline node -e commands with dynamically created script files for BeforeAgent and AfterAgent hooks in a multi-step test.
  • integration-tests/hooks-system.test.ts
    • Imported normalizePath from test-helper.js.
    • Refactored numerous test cases to use rig.createScript for generating hook scripts instead of inline node -e commands.
    • Applied normalizePath to all command paths to ensure cross-platform compatibility.
    • Added sequential: true to various hook configurations across multiple test descriptions.
    • Updated hook_name expectations in telemetry assertions to use normalizePath.
    • Adjusted script creation for failingPath and workingPath in error handling tests.
  • integration-tests/test-helper.ts
    • Exported normalizePath from @google/gemini-cli-test-utils.
  • packages/test-utils/src/test-rig.ts
    • Added a normalizePath utility function to convert backslashes to forward slashes for command-line arguments.
    • Implemented a createScript method to create temporary script files and return their normalized paths.
    • Increased default test timeouts for Windows CI environments to 10 minutes.
    • Ensured the PATH environment variable is explicitly included in pty process environments.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/chained_e2e.yml
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Feb 10, 2026

Size Change: -56 B (0%)

Total Size: 24.4 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 24.4 MB -56 B (0%)
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B

compressed-size-action

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a solid refactoring effort aimed at improving the stability of integration tests, particularly on Windows. The changes primarily involve moving inline node -e commands into separate script files, which is an effective strategy to avoid shell quoting issues across different platforms. The introduction of the createScript and normalizePath helpers in the test rig, along with increased timeouts for Windows CI, are all positive steps towards reducing test flakiness. The consistent addition of sequential: true to hook definitions in tests is also a good practice to ensure deterministic execution and prevent race conditions. Overall, the changes are well-implemented and directly address the goal of improving test reliability.

- Increase default timeout for TestRig.run and TestRig.runCommand to 10 minutes on Windows CI to handle slow environments.
- Replace inline 'node -e' hook commands with script files to avoid brittle quoting and escaping issues on Windows shells.
- Add 'TestRig.createScript' helper to simplify script creation in tests.
- Fix path escaping for hook output files in 'hooks-agent-flow.test.ts' using JSON.stringify.
- Ensure 'TestRig.setup' is called before performing file operations in tests.
- Refactored remaining hook tests in hooks-system.test.ts to use 'rig.createScript' and forward slashes for cross-platform path compatibility.
- Replaced 'node -e' usages with script files to avoid brittle quoting and escaping issues on Windows shells.

Part of #18665
- Enforce 'sequential: true' for all hook tests to prevent telemetry leaks and race conditions.
- Normalize all path assertions in hooks-system.test.ts using a new 'normalizePath' helper to handle Windows backslashes consistently.
- Update 'createScript' in test-rig to return normalized paths.
- Ensure 'PATH' is explicitly passed to node-pty spawn options to prevent 'posix_spawnp' errors in some environments.
- Clean up manual path replacements in tests in favor of the centralized helper.

Part of #18665
- Ensure 'SystemRoot', 'COMSPEC', 'windir', and 'PATHEXT' are passed to node-pty on Windows to prevent 'posix_spawnp' failures.
- Clean up test directories in 'TestRig.setup' to ensure a fresh state for retries and prevent telemetry log accumulation (fixing the 1, 2, 3 failure pattern).
- Fix path normalization in 'Hook Disabling' test to ensure disabled hooks are correctly matched on Windows.

Part of #18665
@NTaylorMullen NTaylorMullen force-pushed the ntm/repro-hook-flakiness branch from c873ed0 to cbb09bb Compare February 14, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant