Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
9631dc8
test: fix hook integration test flakiness on Windows CI
NTaylorMullen Feb 9, 2026
519bf32
test: standardize hook system tests for windows compatibility
NTaylorMullen Feb 10, 2026
bd7904a
test: robust fixes for windows hook flakiness
NTaylorMullen Feb 10, 2026
3643c88
repro: run only hooks-system.test.ts on windows
NTaylorMullen Feb 10, 2026
f54fe58
test: fix windows environment and cleanup issues
NTaylorMullen Feb 10, 2026
afe0ad7
repro: fast windows hook debugging workflow
NTaylorMullen Feb 10, 2026
b75b8b9
repro: add diagnostic logging for setup, cleanup, and pty spawn
NTaylorMullen Feb 10, 2026
d89916c
repro: truly disable other jobs and fix TS error
NTaylorMullen Feb 10, 2026
dca9c9e
fix(test-rig): only clean test directories on first setup call for a …
NTaylorMullen Feb 10, 2026
a3b4a0a
repro: enable push trigger for debugging
NTaylorMullen Feb 10, 2026
4946a5b
repro: retry rmdir, add more logging, and focus on failing tests
NTaylorMullen Feb 10, 2026
cb12e2f
repro: allow vitest .only and focus on stderr blocking test
NTaylorMullen Feb 10, 2026
0017a72
repro: rich logging and focused tests
NTaylorMullen Feb 10, 2026
06f9479
repro: add more logging to HookRegistry and CoreToolScheduler
NTaylorMullen Feb 10, 2026
0c04bc4
repro: add logging to PolicyEngine and HookRunner conversion
NTaylorMullen Feb 10, 2026
9b4e3e7
repro: add even more logging to HookRunner and TestRig
NTaylorMullen Feb 10, 2026
edba8dd
repro: test with exit code 101
NTaylorMullen Feb 10, 2026
b180351
fix(hooks): treat all non-zero exit codes except 1 as blocking
NTaylorMullen Feb 10, 2026
20bcd4e
repro: fix unused variable build error
NTaylorMullen Feb 10, 2026
b077cfe
repro: normalize hook names and use JSON for blocking test
NTaylorMullen Feb 10, 2026
80a0f04
fix(hooks): resolve Windows flakiness and improve reliability
NTaylorMullen Feb 10, 2026
80db53e
fix(hooks): final verified fixes for Windows flakiness
NTaylorMullen Feb 10, 2026
a68d08d
fix(hooks): truly final verified fixes for Windows flakiness
NTaylorMullen Feb 10, 2026
88d6772
fix(hooks): final verified fixes for Windows flakiness (clean version)
NTaylorMullen Feb 10, 2026
cbba40e
repro: re-enable diagnostic logging and focus failing hook tests
NTaylorMullen Feb 10, 2026
009cdd9
repro: fix syntax error and allow focused tests
NTaylorMullen Feb 10, 2026
efab27e
repro: final clean verified fixes for Windows flakiness
NTaylorMullen Feb 10, 2026
1780234
repro: restore writeFileSync and fix hook blocking test
NTaylorMullen Feb 11, 2026
6655b0a
repro: use simple echo hook and re-enable rich logging
NTaylorMullen Feb 11, 2026
5369d65
repro: use simple node script for block test
NTaylorMullen Feb 11, 2026
5b37108
repro: always parse JSON from hook output regardless of exit code
NTaylorMullen Feb 11, 2026
2c300fb
repro: improve telemetry assertion to check stdout/stderr
NTaylorMullen Feb 11, 2026
90f3f67
fix(hooks): final verified fixes for Windows flakiness
NTaylorMullen Feb 11, 2026
de151a4
repro: use rig.createScript for disabling tests and focus them
NTaylorMullen Feb 11, 2026
6ec2ebc
repro: use unique strings for disabling tests and focus them
NTaylorMullen Feb 11, 2026
b6bfdfa
repro: use rig.createScript and telemetry for failing tests
NTaylorMullen Feb 11, 2026
fe07abe
repro: trigger run again
NTaylorMullen Feb 11, 2026
cef4dbe
fix(hooks): final verified fixes for Windows flakiness (fully clean)
NTaylorMullen Feb 11, 2026
780a831
fix(hooks): correctly order rig.setup in system tests
NTaylorMullen Feb 11, 2026
dcb35b2
repro: use echo instead of node for failing tests and focus them
NTaylorMullen Feb 11, 2026
80e893b
repro: use node -e for failing tests and fix setup order
NTaylorMullen Feb 11, 2026
bae5388
repro: improve stability with node -e and flexible assertions
NTaylorMullen Feb 11, 2026
608364b
repro: use node -e and shared setup to avoid EBUSY/PTY flakiness
NTaylorMullen Feb 11, 2026
458e41a
repro: use simple echo hook and re-enable rich logging
NTaylorMullen Feb 11, 2026
e6881af
repro: use single rig.setup and node -e for stability
NTaylorMullen Feb 11, 2026
7fb17c8
fix(hooks): final verified fixes for Windows flakiness
NTaylorMullen Feb 11, 2026
8f108b0
fix(hooks): increase timeout to 60s for Windows reliability
NTaylorMullen Feb 12, 2026
0d2be94
fix(hooks): use file-based hooks instead of node -e for Windows relia…
NTaylorMullen Feb 12, 2026
23590d2
fix(hooks): normalize disabled hook paths for Windows compatibility
NTaylorMullen Feb 12, 2026
f406576
fix(hooks): fix settings structure and use unique rig names for disab…
NTaylorMullen Feb 12, 2026
45ecc6d
fix(hooks): force child_process PTY and fix settings structure in tests
NTaylorMullen Feb 13, 2026
6e78dc7
fix(hooks): force child_process PTY in getPty and use explicit hook n…
NTaylorMullen Feb 13, 2026
f0fbc63
fix(hooks): final verified fixes for Windows flakiness and PTY issues
NTaylorMullen Feb 13, 2026
cbb09bb
test(integration): simplify BeforeToolSelection responses to avoid Wi…
NTaylorMullen Feb 13, 2026
668b7b0
test(rig): improve cleanDir with exponential backoff and better loggi…
NTaylorMullen Feb 14, 2026
2a7f936
test(rig): refactor cleanDir to use Atomics.wait for reliable sync sl…
NTaylorMullen Feb 14, 2026
533ea60
fix(lint): resolve unused variables and unexpected console logs
NTaylorMullen Feb 14, 2026
0fb7b3d
style: fix formatting in hooks-agent-flow.test.ts
NTaylorMullen Feb 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 17 additions & 6 deletions .github/workflows/chained_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ jobs:
UV_THREADPOOL_SIZE: '32'
NODE_ENV: 'test'
shell: 'pwsh'
run: 'npm run test:integration:sandbox:none'
run: 'npm run test:e2e'

evals:
name: 'Evals (ALWAYS_PASSING)'
Expand Down Expand Up @@ -315,19 +315,30 @@ jobs:
needs:
- 'e2e_linux'
- 'e2e_mac'
- 'e2e_windows'
- 'evals'
- 'merge_queue_skipper'
runs-on: 'gemini-cli-ubuntu-16-core'
steps:
- name: 'Check E2E test results'
run: |
if [[ ${{ needs.e2e_linux.result }} != 'success' || \
${{ needs.e2e_mac.result }} != 'success' || \
${{ needs.evals.result }} != 'success' ]]; then
echo "One or more E2E jobs failed."
if [[ ${{ needs.e2e_linux.result }} != 'success' ]]; then
echo "Linux E2E job failed."
exit 1
fi
if [[ ${{ needs.e2e_mac.result }} != 'success' ]]; then
echo "macOS E2E job failed."
exit 1
fi
if [[ ${{ needs.e2e_windows.result }} != 'success' ]]; then
echo "Windows E2E job failed."
exit 1
fi
if [[ ${{ needs.evals.result }} != 'success' ]]; then
echo "Evals job failed."
exit 1
fi
echo "All required E2E jobs passed!"
echo "All E2E jobs passed!"

set_workflow_status:
runs-on: 'gemini-cli-ubuntu-16-core'
Expand Down
61 changes: 45 additions & 16 deletions integration-tests/hooks-agent-flow.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
*/

import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { TestRig } from './test-helper.js';
import { TestRig, normalizePath } from './test-helper.js';
import { join } from 'node:path';
import { writeFileSync } from 'node:fs';

Expand Down Expand Up @@ -113,10 +113,9 @@ describe('Hooks Agent Flow', () => {
}
`;

const scriptPath = join(rig.testDir!, 'after_agent_verify.cjs');
writeFileSync(scriptPath, hookScript);
const scriptPath = rig.createScript('after_agent_verify.cjs', hookScript);

await rig.setup('should receive prompt and response in AfterAgent hook', {
rig.setup('should receive prompt and response in AfterAgent hook', {
settings: {
hooksConfig: {
enabled: true,
Expand All @@ -127,7 +126,7 @@ describe('Hooks Agent Flow', () => {
hooks: [
{
type: 'command',
command: `node "${scriptPath}"`,
command: normalizePath(`node "${scriptPath}"`)!,
timeout: 5000,
},
],
Expand Down Expand Up @@ -157,7 +156,7 @@ describe('Hooks Agent Flow', () => {
});

it('should process clearContext in AfterAgent hook output', async () => {
await rig.setup('should process clearContext in AfterAgent hook output', {
rig.setup('should process clearContext in AfterAgent hook output', {
fakeResponsesPath: join(
import.meta.dirname,
'hooks-system.after-agent.responses',
Expand All @@ -171,18 +170,32 @@ describe('Hooks Agent Flow', () => {
const input = JSON.parse(fs.readFileSync(0, 'utf-8'));
const messageCount = input.llm_request?.contents?.length || 0;
let counts = [];
try { counts = JSON.parse(fs.readFileSync('${messageCountFile}', 'utf-8')); } catch (e) {}
try { counts = JSON.parse(fs.readFileSync(${JSON.stringify(messageCountFile)}, 'utf-8')); } catch (e) {}
counts.push(messageCount);
fs.writeFileSync('${messageCountFile}', JSON.stringify(counts));
fs.writeFileSync(${JSON.stringify(messageCountFile)}, JSON.stringify(counts));
console.log(JSON.stringify({ decision: 'allow' }));
`;
const beforeModelScriptPath = join(
rig.testDir!,
const beforeModelScriptPath = rig.createScript(
'before_model_counter.cjs',
beforeModelScript,
);
writeFileSync(beforeModelScriptPath, beforeModelScript);

await rig.setup('should process clearContext in AfterAgent hook output', {
const afterAgentScript = `
console.log(JSON.stringify({
decision: 'block',
reason: 'Security policy triggered',
hookSpecificOutput: {
hookEventName: 'AfterAgent',
clearContext: true
}
}));
`;
const afterAgentScriptPath = rig.createScript(
'after_agent_clear.cjs',
afterAgentScript,
);

rig.setup('should process clearContext in AfterAgent hook output', {
settings: {
hooks: {
enabled: true,
Expand All @@ -191,7 +204,7 @@ describe('Hooks Agent Flow', () => {
hooks: [
{
type: 'command',
command: `node "${beforeModelScriptPath}"`,
command: normalizePath(`node "${beforeModelScriptPath}"`)!,
timeout: 5000,
},
],
Expand All @@ -202,7 +215,7 @@ describe('Hooks Agent Flow', () => {
hooks: [
{
type: 'command',
command: `node -e "console.log(JSON.stringify({decision: 'block', reason: 'Security policy triggered', hookSpecificOutput: {hookEventName: 'AfterAgent', clearContext: true}}))"`,
command: normalizePath(`node "${afterAgentScriptPath}"`)!,
timeout: 5000,
},
],
Expand Down Expand Up @@ -244,6 +257,22 @@ describe('Hooks Agent Flow', () => {
import.meta.dirname,
'hooks-agent-flow-multistep.responses',
),
},
);

// Create script files for hooks
const baPath = rig.createScript(
'ba_fired.cjs',
"console.log('BeforeAgent Fired');",
);
const aaPath = rig.createScript(
'aa_fired.cjs',
"console.log('AfterAgent Fired');",
);

await rig.setup(
'should fire BeforeAgent and AfterAgent exactly once per turn despite tool calls',
{
settings: {
hooksConfig: {
enabled: true,
Expand All @@ -254,7 +283,7 @@ describe('Hooks Agent Flow', () => {
hooks: [
{
type: 'command',
command: `node -e "console.log('BeforeAgent Fired')"`,
command: normalizePath(`node "${baPath}"`)!,
timeout: 5000,
},
],
Expand All @@ -265,7 +294,7 @@ describe('Hooks Agent Flow', () => {
hooks: [
{
type: 'command',
command: `node -e "console.log('AfterAgent Fired')"`,
command: normalizePath(`node "${aaPath}"`)!,
timeout: 5000,
},
],
Expand Down
35 changes: 1 addition & 34 deletions integration-tests/hooks-system.before-tool-selection.responses

Large diffs are not rendered by default.

Loading
Loading