repro: windows hook flakiness by NTaylorMullen · Pull Request #18714 · google-gemini/gemini-cli

NTaylorMullen · 2026-02-10T07:14:21Z

Summary

Reproduction of Windows hook test flakiness.

Details

This PR modifies the CI workflow to run only hooks-system.test.ts on Windows to isolate failures.

Related Issues

Related to #18665

How to Validate

Watch the CI run.

Pre-Merge Checklist

gemini-code-assist · 2026-02-10T07:14:43Z

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to address flakiness observed in Windows hook tests by refactoring how external script commands are executed within the test suite. By moving inline node -e commands into dedicated script files and introducing path normalization, the changes enhance the robustness and cross-platform compatibility of the tests. Additionally, increased timeouts for Windows CI and explicit PATH handling for pty processes further contribute to a more stable testing environment.

Highlights

Cross-Platform Compatibility for Hook Commands: Refactored inline node -e commands in integration tests into separate script files to improve cross-platform compatibility, particularly on Windows where quoting and escaping can be problematic.
Path Normalization Utility: Introduced a normalizePath utility function to ensure consistent path formatting across different operating systems when executing commands, converting Windows backslashes to forward slashes.
Sequential Hook Execution: Added sequential: true to numerous hook configurations in tests, likely to ensure predictable execution order and reduce flakiness in complex hook scenarios.
Increased Windows CI Test Timeouts: Increased default test timeouts specifically for Windows CI environments to 10 minutes (from 5 minutes) to accommodate potential performance differences and mitigate flakiness.
Script Creation Helper: Implemented a createScript helper method in TestRig to streamline the creation and management of temporary script files for tests, improving test readability and maintainability.
Pty PATH Environment Variable: Ensured the PATH environment variable is explicitly included for pty processes to prevent command execution issues in certain test environments.

Changelog

integration-tests/hooks-agent-flow.test.ts
- Updated fs.readFileSync calls to use JSON.stringify for messageCountFile path.
- Replaced inline node -e command with a dynamically created script file for AfterAgent hook.
- Replaced inline node -e commands with dynamically created script files for BeforeAgent and AfterAgent hooks in a multi-step test.
integration-tests/hooks-system.test.ts
- Imported normalizePath from test-helper.js.
- Refactored numerous test cases to use rig.createScript for generating hook scripts instead of inline node -e commands.
- Applied normalizePath to all command paths to ensure cross-platform compatibility.
- Added sequential: true to various hook configurations across multiple test descriptions.
- Updated hook_name expectations in telemetry assertions to use normalizePath.
- Adjusted script creation for failingPath and workingPath in error handling tests.
integration-tests/test-helper.ts
- Exported normalizePath from @google/gemini-cli-test-utils.
packages/test-utils/src/test-rig.ts
- Added a normalizePath utility function to convert backslashes to forward slashes for command-line arguments.
- Implemented a createScript method to create temporary script files and return their normalized paths.
- Increased default test timeouts for Windows CI environments to 10 minutes.
- Ensured the PATH environment variable is explicitly included in pty process environments.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/chained_e2e.yml

Activity

The author, NTaylorMullen, created this pull request to reproduce and address Windows hook test flakiness.
The changes are intended to be validated by observing the CI run results.
This PR is related to issue test: fix hook integration test flakiness on Windows CI #18665.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-10T07:17:21Z

Size Change: -56 B (0%)

Total Size: 24.4 MB

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/gemini.js`	24.4 MB	-56 B (0%)
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B
`./bundle/sandbox-macos-strict-open.sb`	4.82 kB	0 B
`./bundle/sandbox-macos-strict-proxied.sb`	5.02 kB	0 B

_{compressed-size-action}

gemini-code-assist

Code Review

This pull request is a solid refactoring effort aimed at improving the stability of integration tests, particularly on Windows. The changes primarily involve moving inline node -e commands into separate script files, which is an effective strategy to avoid shell quoting issues across different platforms. The introduction of the createScript and normalizePath helpers in the test rig, along with increased timeouts for Windows CI, are all positive steps towards reducing test flakiness. The consistent addition of sequential: true to hook definitions in tests is also a good practice to ensure deterministic execution and prevent race conditions. Overall, the changes are well-implemented and directly address the goal of improving test reliability.

integration-tests/hooks-system.test.ts

- Increase default timeout for TestRig.run and TestRig.runCommand to 10 minutes on Windows CI to handle slow environments. - Replace inline 'node -e' hook commands with script files to avoid brittle quoting and escaping issues on Windows shells. - Add 'TestRig.createScript' helper to simplify script creation in tests. - Fix path escaping for hook output files in 'hooks-agent-flow.test.ts' using JSON.stringify. - Ensure 'TestRig.setup' is called before performing file operations in tests.

- Refactored remaining hook tests in hooks-system.test.ts to use 'rig.createScript' and forward slashes for cross-platform path compatibility. - Replaced 'node -e' usages with script files to avoid brittle quoting and escaping issues on Windows shells. Part of #18665

- Enforce 'sequential: true' for all hook tests to prevent telemetry leaks and race conditions. - Normalize all path assertions in hooks-system.test.ts using a new 'normalizePath' helper to handle Windows backslashes consistently. - Update 'createScript' in test-rig to return normalized paths. - Ensure 'PATH' is explicitly passed to node-pty spawn options to prevent 'posix_spawnp' errors in some environments. - Clean up manual path replacements in tests in favor of the centralized helper. Part of #18665

- Ensure 'SystemRoot', 'COMSPEC', 'windir', and 'PATHEXT' are passed to node-pty on Windows to prevent 'posix_spawnp' failures. - Clean up test directories in 'TestRig.setup' to ensure a fresh state for retries and prevent telemetry log accumulation (fixing the 1, 2, 3 failure pattern). - Fix path normalization in 'Hook Disabling' test to ensure disabled hooks are correctly matched on Windows. Part of #18665

…rig instance

…bility

…ling tests

…ames in tests

…ndows timeout

…ng for Windows

…eep and increase retries

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

github-advanced-security bot found potential problems Feb 11, 2026

View reviewed changes

integration-tests/hooks-system.test.ts Fixed Show fixed Hide fixed

NTaylorMullen added 26 commits February 13, 2026 16:03

repro: run only hooks-system.test.ts on windows

3643c88

repro: fast windows hook debugging workflow

afe0ad7

repro: add diagnostic logging for setup, cleanup, and pty spawn

b75b8b9

repro: truly disable other jobs and fix TS error

d89916c

fix(test-rig): only clean test directories on first setup call for a …

dca9c9e

…rig instance

repro: enable push trigger for debugging

a3b4a0a

repro: retry rmdir, add more logging, and focus on failing tests

4946a5b

repro: allow vitest .only and focus on stderr blocking test

cb12e2f

repro: rich logging and focused tests

0017a72

repro: add more logging to HookRegistry and CoreToolScheduler

06f9479

repro: add logging to PolicyEngine and HookRunner conversion

0c04bc4

repro: add even more logging to HookRunner and TestRig

9b4e3e7

repro: test with exit code 101

edba8dd

fix(hooks): treat all non-zero exit codes except 1 as blocking

b180351

repro: fix unused variable build error

20bcd4e

repro: normalize hook names and use JSON for blocking test

b077cfe

fix(hooks): resolve Windows flakiness and improve reliability

80a0f04

fix(hooks): final verified fixes for Windows flakiness

80db53e

fix(hooks): truly final verified fixes for Windows flakiness

a68d08d

fix(hooks): final verified fixes for Windows flakiness (clean version)

88d6772

repro: re-enable diagnostic logging and focus failing hook tests

cbba40e

repro: fix syntax error and allow focused tests

009cdd9

NTaylorMullen added 24 commits February 13, 2026 16:03

repro: always parse JSON from hook output regardless of exit code

5b37108

repro: improve telemetry assertion to check stdout/stderr

2c300fb

fix(hooks): final verified fixes for Windows flakiness

90f3f67

repro: use rig.createScript for disabling tests and focus them

de151a4

repro: use unique strings for disabling tests and focus them

6ec2ebc

repro: use rig.createScript and telemetry for failing tests

b6bfdfa

repro: trigger run again

fe07abe

fix(hooks): final verified fixes for Windows flakiness (fully clean)

cef4dbe

fix(hooks): correctly order rig.setup in system tests

780a831

repro: use echo instead of node for failing tests and focus them

dcb35b2

repro: use node -e for failing tests and fix setup order

80e893b

repro: improve stability with node -e and flexible assertions

bae5388

repro: use node -e and shared setup to avoid EBUSY/PTY flakiness

608364b

repro: use simple echo hook and re-enable rich logging

458e41a

repro: use single rig.setup and node -e for stability

e6881af

fix(hooks): final verified fixes for Windows flakiness

7fb17c8

fix(hooks): increase timeout to 60s for Windows reliability

8f108b0

fix(hooks): use file-based hooks instead of node -e for Windows relia…

0d2be94

…bility

fix(hooks): normalize disabled hook paths for Windows compatibility

23590d2

fix(hooks): fix settings structure and use unique rig names for disab…

f406576

…ling tests

fix(hooks): force child_process PTY and fix settings structure in tests

45ecc6d

fix(hooks): force child_process PTY in getPty and use explicit hook n…

6e78dc7

…ames in tests

fix(hooks): final verified fixes for Windows flakiness and PTY issues

f0fbc63

test(integration): simplify BeforeToolSelection responses to avoid Wi…

cbb09bb

…ndows timeout

NTaylorMullen force-pushed the ntm/repro-hook-flakiness branch from c873ed0 to cbb09bb Compare February 14, 2026 00:03

NTaylorMullen added 4 commits February 14, 2026 11:31

test(rig): improve cleanDir with exponential backoff and better loggi…

668b7b0

…ng for Windows

test(rig): refactor cleanDir to use Atomics.wait for reliable sync sl…

2a7f936

…eep and increase retries

fix(lint): resolve unused variables and unexpected console logs

533ea60

style: fix formatting in hooks-agent-flow.test.ts

0fb7b3d

NTaylorMullen closed this Feb 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repro: windows hook flakiness#18714

repro: windows hook flakiness#18714
NTaylorMullen wants to merge 58 commits intomainfrom
ntm/repro-hook-flakiness

NTaylorMullen commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NTaylorMullen commented Feb 10, 2026

Summary

Details

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 10, 2026 •

edited

Loading