diff --git a/.claude/skills/skill-writer.md b/.claude/skills/skill-writer/SKILL.md
similarity index 96%
rename from .claude/skills/skill-writer.md
rename to .claude/skills/skill-writer/SKILL.md
index f804ade4..61da4ded 100644
--- a/.claude/skills/skill-writer.md
+++ b/.claude/skills/skill-writer/SKILL.md
@@ -1,6 +1,7 @@
-# Skill Writer
-
-Generate valid Warden skill definitions from natural language descriptions.
+---
+name: Skill Writer
+description: Generate valid Warden skill definitions from natural language descriptions
+---
 
 ## Usage
 
diff --git a/.claude/skills/testing-guidelines/SKILL.md b/.claude/skills/testing-guidelines/SKILL.md
new file mode 100644
index 00000000..57a94384
--- /dev/null
+++ b/.claude/skills/testing-guidelines/SKILL.md
@@ -0,0 +1,128 @@
+---
+name: testing-guidelines
+description: Guide for writing tests. Use when adding new functionality, fixing bugs, or when tests are needed. Emphasizes integration tests, real-world fixtures, and regression coverage.
+---
+
+# Testing Guidelines
+
+Follow these principles when writing tests for this codebase.
+
+## Core Principles
+
+### 1. Mock External Services, Use Real Fixtures
+
+**ALWAYS** mock third-party network services. **ALWAYS** use fixtures based on real-world data.
+
+- Fixtures must be scrubbed of PII (use dummy data like `foo@example.com`, `user-123`)
+- Capture real API responses, then sanitize them
+- Never make actual network calls in tests
+
+### 2. Prefer Integration Tests Over Unit Tests
+
+Focus on **end-to-end style tests** that validate inputs and outputs, not implementation details.
+
+- Test the public interface, not internal methods
+- Unit tests are valuable for edge cases in pure functions, but integration tests are the priority
+- If refactoring breaks tests but behavior is unchanged, the tests were too coupled to implementation
+
+### 3. Minimize Edge Case Testing
+
+Don't test every variant of a problem.
+
+- Cover the **common path** thoroughly
+- Skip exhaustive input permutations
+- Skip unlikely edge cases that add maintenance burden without value
+- One representative test per category of input is usually sufficient
+
+### 4. Always Add Regression Tests for Bugs
+
+When a bug is identified, **ALWAYS** add a test that would have caught it.
+
+- The test should fail before the fix and pass after
+- Name it descriptively to document the bug
+- This prevents the same bug from recurring
+
+### 5. Cover Every User Entry Point
+
+**ALWAYS** have at least one basic test for each customer/user entry point.
+
+- CLI commands, API endpoints, public functions
+- Test the common/happy path first
+- This proves the entry point works at all
+
+### 6. Tests Validate Before Manual QA
+
+Tests are how we validate **ANY** functionality works before manual testing.
+
+- Write tests first or alongside code, not as an afterthought
+- If you can't test it, reconsider the design
+- Passing tests should give confidence to ship
+
+## Technical Guidelines
+
+### File Organization
+
+- Test files use `*.test.ts` extension
+- Co-locate tests with source: `foo.ts` → `foo.test.ts`
+
+### Test Isolation
+
+Every test must:
+- Run independently without affecting other tests
+- Use temporary directories for file operations
+- Clean up resources in `afterEach` hooks
+
+```typescript
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { mkdirSync, rmSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+import { tmpdir } from 'node:os';
+
+describe('my feature', () => {
+  let tempDir: string;
+
+  beforeEach(() => {
+    tempDir = join(tmpdir(), `warden-test-${Date.now()}`);
+    mkdirSync(tempDir, { recursive: true });
+  });
+
+  afterEach(() => {
+    rmSync(tempDir, { recursive: true, force: true });
+  });
+
+  it('does something with files', () => {
+    writeFileSync(join(tempDir, 'test.ts'), 'content');
+    // ... test code
+  });
+});
+```
+
+### Pure Function Tests
+
+For pure functions without side effects, no special setup is needed:
+
+```typescript
+import { describe, it, expect } from 'vitest';
+import { matchGlob } from './matcher.js';
+
+describe('matchGlob', () => {
+  it('matches exact paths', () => {
+    expect(matchGlob('src/index.ts', 'src/index.ts')).toBe(true);
+  });
+});
+```
+
+## Running Tests
+
+```bash
+pnpm test              # Run all tests in watch mode
+pnpm test:run          # Run all tests once
+```
+
+## Checklist Before Submitting
+
+- [ ] New entry points have at least one happy-path test
+- [ ] Bug fixes include a regression test
+- [ ] External services are mocked with sanitized fixtures
+- [ ] Tests validate behavior, not implementation
+- [ ] No shared state between tests
diff --git a/AGENTS.md b/AGENTS.md
index 2bc71c30..b2a6edf9 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -56,10 +56,12 @@ export { SkillReport, runSkill } from "./types/index.js";
 
 ## Testing
 
-When adding new behavior or modifying existing functionality, review `TESTING.md` to determine if tests are needed. Key points:
+**Always reference `/testing-guidelines` when writing tests.** Key principles:
 
-- Test error cases users will actually hit
-- Add regression tests for bugs
+- Mock external services, use sanitized real-world fixtures
+- Prefer integration tests over unit tests
+- Always add regression tests for bugs
+- Cover every user entry point with at least a happy-path test
 - Co-locate tests with source (`foo.ts` → `foo.test.ts`)
 
 ## Task Management
diff --git a/TESTING.md b/TESTING.md
deleted file mode 100644
index 8a24a333..00000000
--- a/TESTING.md
+++ /dev/null
@@ -1,129 +0,0 @@
-# Testing Guidelines
-
-## Philosophy
-
-Tests should be **focused** and **isolated**. Every test must:
-
-- Run independently without affecting other tests or local state
-- Use temporary directories for file operations
-- Clean up resources in `afterEach` hooks
-
-## Running Tests
-
-```bash
-pnpm test              # Run all tests in watch mode
-pnpm test:run          # Run all tests once
-```
-
-## Test Organization
-
-### File Naming
-
-- Test files use `*.test.ts` extension
-- Co-locate tests with source: `foo.ts` → `foo.test.ts`
-
-### Directory Structure
-
-```
-src/
-├── cli/
-│   ├── args.ts
-│   ├── args.test.ts
-│   ├── files.ts
-│   └── files.test.ts
-├── triggers/
-│   ├── matcher.ts
-│   └── matcher.test.ts
-├── output/
-│   ├── renderer.ts
-│   └── renderer.test.ts
-└── skills/
-    ├── loader.ts
-    └── loader.test.ts
-```
-
-## Test Isolation
-
-### File System Tests
-
-Always use temporary directories for file operations:
-
-```typescript
-import { describe, it, expect, beforeEach, afterEach } from 'vitest';
-import { mkdirSync, rmSync, writeFileSync } from 'node:fs';
-import { join } from 'node:path';
-import { tmpdir } from 'node:os';
-
-describe('my feature', () => {
-  let tempDir: string;
-
-  beforeEach(() => {
-    tempDir = join(tmpdir(), `warden-test-${Date.now()}`);
-    mkdirSync(tempDir, { recursive: true });
-  });
-
-  afterEach(() => {
-    rmSync(tempDir, { recursive: true, force: true });
-  });
-
-  it('does something with files', () => {
-    writeFileSync(join(tempDir, 'test.ts'), 'content');
-    // ... test code
-  });
-});
-```
-
-### Pure Function Tests
-
-For pure functions without side effects, no special setup is needed:
-
-```typescript
-import { describe, it, expect } from 'vitest';
-import { matchGlob, shouldFail } from './matcher.js';
-
-describe('matchGlob', () => {
-  it('matches exact paths', () => {
-    expect(matchGlob('src/index.ts', 'src/index.ts')).toBe(true);
-  });
-});
-```
-
-## Coverage Depth
-
-Test core behavior and catch regressions—not every possible edge case. Prioritize:
-
-- Happy paths and common usage patterns
-- Error cases users will actually hit
-- Past bugs (regression tests)
-
-Skip:
-
-- Exhaustive input permutations
-- Unlikely edge cases that add maintenance burden without value
-- Implementation details that may change
-
-## Writing Good Tests
-
-### Do
-
-- Test behavior, not implementation
-- Use descriptive test names that explain the scenario
-- Test error cases users will realistically encounter
-- Group related tests with nested `describe()` blocks
-- Verify cleanup happens (no leftover files)
-
-### Don't
-
-- Share state between tests (each test should be independent)
-- Depend on test execution order
-- Leave unrestored mocks or spies
-- Use hardcoded paths (use temp directories)
-
-## Coverage Goals
-
-Cover core functionality:
-
-- CLI argument parsing and file handling
-- Trigger matching logic
-- Skill loading and execution
-- Output rendering
diff --git a/warden.toml b/warden.toml
index f8e5ca76..efdaaf43 100644
--- a/warden.toml
+++ b/warden.toml
@@ -11,3 +11,9 @@ name = "find-bugs"
 event = "pull_request"
 actions = ["opened", "synchronize", "reopened"]
 skill = "find-bugs"
+
+[[triggers]]
+name = "testing-guidelines"
+event = "pull_request"
+actions = ["opened", "synchronize", "reopened"]
+skill = "testing-guidelines"