Skip to content

test: fix hook integration test flakiness on Windows CI#18665

Merged
NTaylorMullen merged 1 commit intomainfrom
ntm/fix-hook-test-flakiness
Feb 15, 2026
Merged

test: fix hook integration test flakiness on Windows CI#18665
NTaylorMullen merged 1 commit intomainfrom
ntm/fix-hook-test-flakiness

Conversation

@NTaylorMullen
Copy link
Collaborator

@NTaylorMullen NTaylorMullen commented Feb 9, 2026

test: robust fixes for windows hook flakiness

This PR addresses multiple sources of flakiness and failure in the hook system integration tests on Windows.

Fixes

  1. Telemetry Leakage / Accumulating Failures:

    • Issue: Test retries were reusing the same test directory (derived from globalSetup timestamp), causing readToolLogs to pick up telemetry from previous failed attempts. This led to assertions seeing 1, then 2, then 3 tool calls instead of 0.
    • Fix: Updated TestRig.setup to aggressively clean up the test directory before starting a test run.
  2. Process Spawning Failures (posix_spawnp):

    • Issue: Interactive tests using node-pty failed with posix_spawnp failed on Windows. This is often due to the PTY process missing critical environment variables like SystemRoot or COMSPEC.
    • Fix: Updated TestRig to explicitly include SystemRoot, COMSPEC, windir, and PATHEXT in the PTY environment options on Windows.
  3. Disabled Hook Matching:

    • Issue: The 'Hook Disabling' test failed because the disabled hook path in the settings (using backslashes on Windows) didn't match the normalized hook command path (using forward slashes) used by the runner.
    • Fix: Updated the test to use normalizePath for the disabled list configuration, ensuring consistent path matching.
  4. General Flakiness & Timeouts:

    • Fix: Increased Vitest timeout to 10 minutes for Windows.
    • Fix: Enforced sequential: true for all hook definitions in tests to prevent race conditions and ensure clean telemetry isolation.
    • Fix: Standardized on normalizePath helper for all path assertions to avoid fragile manual string replacement.

Validation

  • Created a reproduction branch ntm/repro-hook-flakiness with a trimmed-down CI workflow to verify these fixes specifically on Windows.
  • Local tests pass.

@NTaylorMullen NTaylorMullen requested a review from a team as a code owner February 9, 2026 19:37
@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Size Change: -73 B (0%)

Total Size: 24.4 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 24.4 MB -73 B (0%)
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B

compressed-size-action

@NTaylorMullen NTaylorMullen force-pushed the ntm/fix-hook-test-flakiness branch from 3e6cdea to 33ef0ea Compare February 9, 2026 20:00
@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Feb 9, 2026
@NTaylorMullen NTaylorMullen force-pushed the ntm/fix-hook-test-flakiness branch from 33ef0ea to f764fb4 Compare February 9, 2026 21:12
NTaylorMullen added a commit that referenced this pull request Feb 10, 2026
- Refactored remaining hook tests in hooks-system.test.ts to use 'rig.createScript' and forward slashes for cross-platform path compatibility.
- Replaced 'node -e' usages with script files to avoid brittle quoting and escaping issues on Windows shells.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 10, 2026
- Enforce 'sequential: true' for all hook tests to prevent telemetry leaks and race conditions.
- Normalize all path assertions in hooks-system.test.ts using a new 'normalizePath' helper to handle Windows backslashes consistently.
- Update 'createScript' in test-rig to return normalized paths.
- Ensure 'PATH' is explicitly passed to node-pty spawn options to prevent 'posix_spawnp' errors in some environments.
- Clean up manual path replacements in tests in favor of the centralized helper.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 10, 2026
- Ensure 'SystemRoot', 'COMSPEC', 'windir', and 'PATHEXT' are passed to node-pty on Windows to prevent 'posix_spawnp' failures.
- Clean up test directories in 'TestRig.setup' to ensure a fresh state for retries and prevent telemetry log accumulation (fixing the 1, 2, 3 failure pattern).
- Fix path normalization in 'Hook Disabling' test to ensure disabled hooks are correctly matched on Windows.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 10, 2026
- Refactored remaining hook tests in hooks-system.test.ts to use 'rig.createScript' and forward slashes for cross-platform path compatibility.
- Replaced 'node -e' usages with script files to avoid brittle quoting and escaping issues on Windows shells.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 10, 2026
- Enforce 'sequential: true' for all hook tests to prevent telemetry leaks and race conditions.
- Normalize all path assertions in hooks-system.test.ts using a new 'normalizePath' helper to handle Windows backslashes consistently.
- Update 'createScript' in test-rig to return normalized paths.
- Ensure 'PATH' is explicitly passed to node-pty spawn options to prevent 'posix_spawnp' errors in some environments.
- Clean up manual path replacements in tests in favor of the centralized helper.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 10, 2026
- Ensure 'SystemRoot', 'COMSPEC', 'windir', and 'PATHEXT' are passed to node-pty on Windows to prevent 'posix_spawnp' failures.
- Clean up test directories in 'TestRig.setup' to ensure a fresh state for retries and prevent telemetry log accumulation (fixing the 1, 2, 3 failure pattern).
- Fix path normalization in 'Hook Disabling' test to ensure disabled hooks are correctly matched on Windows.

Part of #18665
@NTaylorMullen NTaylorMullen force-pushed the ntm/fix-hook-test-flakiness branch from 672f57b to 3db7732 Compare February 10, 2026 07:38
NTaylorMullen added a commit that referenced this pull request Feb 14, 2026
- Refactored remaining hook tests in hooks-system.test.ts to use 'rig.createScript' and forward slashes for cross-platform path compatibility.
- Replaced 'node -e' usages with script files to avoid brittle quoting and escaping issues on Windows shells.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 14, 2026
- Enforce 'sequential: true' for all hook tests to prevent telemetry leaks and race conditions.
- Normalize all path assertions in hooks-system.test.ts using a new 'normalizePath' helper to handle Windows backslashes consistently.
- Update 'createScript' in test-rig to return normalized paths.
- Ensure 'PATH' is explicitly passed to node-pty spawn options to prevent 'posix_spawnp' errors in some environments.
- Clean up manual path replacements in tests in favor of the centralized helper.

Part of #18665
NTaylorMullen added a commit that referenced this pull request Feb 14, 2026
- Ensure 'SystemRoot', 'COMSPEC', 'windir', and 'PATHEXT' are passed to node-pty on Windows to prevent 'posix_spawnp' failures.
- Clean up test directories in 'TestRig.setup' to ensure a fresh state for retries and prevent telemetry log accumulation (fixing the 1, 2, 3 failure pattern).
- Fix path normalization in 'Hook Disabling' test to ensure disabled hooks are correctly matched on Windows.

Part of #18665
@NTaylorMullen NTaylorMullen force-pushed the ntm/fix-hook-test-flakiness branch from 51a061b to f28fa76 Compare February 15, 2026 18:54
@NTaylorMullen NTaylorMullen requested a review from a team as a code owner February 15, 2026 18:54
@NTaylorMullen NTaylorMullen force-pushed the ntm/fix-hook-test-flakiness branch from f28fa76 to 8a62402 Compare February 15, 2026 18:58
- Stabilized TestRig cleanup with synchronous exponential backoff using Atomics.wait
- Optimized BeforeToolSelection test by truncating mock response loop
- Improved hook output parsing to handle both stdout and stderr regardless of exit code
- Standardized hook settings structure in integration tests
- Enforced child_process PTY on Windows for CI stability
@NTaylorMullen NTaylorMullen force-pushed the ntm/fix-hook-test-flakiness branch from 8a62402 to a780ad2 Compare February 15, 2026 19:08
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Feb 15, 2026
Merged via the queue into main with commit 884acda Feb 15, 2026
27 checks passed
@NTaylorMullen NTaylorMullen deleted the ntm/fix-hook-test-flakiness branch February 15, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants