diff --git a/.claude/skills/skill-writer.md b/.claude/skills/skill-writer/SKILL.md similarity index 96% rename from .claude/skills/skill-writer.md rename to .claude/skills/skill-writer/SKILL.md index f804ade4..61da4ded 100644 --- a/.claude/skills/skill-writer.md +++ b/.claude/skills/skill-writer/SKILL.md @@ -1,6 +1,7 @@ -# Skill Writer - -Generate valid Warden skill definitions from natural language descriptions. +--- +name: Skill Writer +description: Generate valid Warden skill definitions from natural language descriptions +--- ## Usage diff --git a/.claude/skills/testing-guidelines/SKILL.md b/.claude/skills/testing-guidelines/SKILL.md new file mode 100644 index 00000000..57a94384 --- /dev/null +++ b/.claude/skills/testing-guidelines/SKILL.md @@ -0,0 +1,128 @@ +--- +name: testing-guidelines +description: Guide for writing tests. Use when adding new functionality, fixing bugs, or when tests are needed. Emphasizes integration tests, real-world fixtures, and regression coverage. +--- + +# Testing Guidelines + +Follow these principles when writing tests for this codebase. + +## Core Principles + +### 1. Mock External Services, Use Real Fixtures + +**ALWAYS** mock third-party network services. **ALWAYS** use fixtures based on real-world data. + +- Fixtures must be scrubbed of PII (use dummy data like `foo@example.com`, `user-123`) +- Capture real API responses, then sanitize them +- Never make actual network calls in tests + +### 2. Prefer Integration Tests Over Unit Tests + +Focus on **end-to-end style tests** that validate inputs and outputs, not implementation details. + +- Test the public interface, not internal methods +- Unit tests are valuable for edge cases in pure functions, but integration tests are the priority +- If refactoring breaks tests but behavior is unchanged, the tests were too coupled to implementation + +### 3. Minimize Edge Case Testing + +Don't test every variant of a problem. + +- Cover the **common path** thoroughly +- Skip exhaustive input permutations +- Skip unlikely edge cases that add maintenance burden without value +- One representative test per category of input is usually sufficient + +### 4. Always Add Regression Tests for Bugs + +When a bug is identified, **ALWAYS** add a test that would have caught it. + +- The test should fail before the fix and pass after +- Name it descriptively to document the bug +- This prevents the same bug from recurring + +### 5. Cover Every User Entry Point + +**ALWAYS** have at least one basic test for each customer/user entry point. + +- CLI commands, API endpoints, public functions +- Test the common/happy path first +- This proves the entry point works at all + +### 6. Tests Validate Before Manual QA + +Tests are how we validate **ANY** functionality works before manual testing. + +- Write tests first or alongside code, not as an afterthought +- If you can't test it, reconsider the design +- Passing tests should give confidence to ship + +## Technical Guidelines + +### File Organization + +- Test files use `*.test.ts` extension +- Co-locate tests with source: `foo.ts` → `foo.test.ts` + +### Test Isolation + +Every test must: +- Run independently without affecting other tests +- Use temporary directories for file operations +- Clean up resources in `afterEach` hooks + +```typescript +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdirSync, rmSync, writeFileSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; + +describe('my feature', () => { + let tempDir: string; + + beforeEach(() => { + tempDir = join(tmpdir(), `warden-test-${Date.now()}`); + mkdirSync(tempDir, { recursive: true }); + }); + + afterEach(() => { + rmSync(tempDir, { recursive: true, force: true }); + }); + + it('does something with files', () => { + writeFileSync(join(tempDir, 'test.ts'), 'content'); + // ... test code + }); +}); +``` + +### Pure Function Tests + +For pure functions without side effects, no special setup is needed: + +```typescript +import { describe, it, expect } from 'vitest'; +import { matchGlob } from './matcher.js'; + +describe('matchGlob', () => { + it('matches exact paths', () => { + expect(matchGlob('src/index.ts', 'src/index.ts')).toBe(true); + }); +}); +``` + +## Running Tests + +```bash +pnpm test # Run all tests in watch mode +pnpm test:run # Run all tests once +``` + +## Checklist Before Submitting + +- [ ] New entry points have at least one happy-path test +- [ ] Bug fixes include a regression test +- [ ] External services are mocked with sanitized fixtures +- [ ] Tests validate behavior, not implementation +- [ ] No shared state between tests diff --git a/AGENTS.md b/AGENTS.md index 2bc71c30..b2a6edf9 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -56,10 +56,12 @@ export { SkillReport, runSkill } from "./types/index.js"; ## Testing -When adding new behavior or modifying existing functionality, review `TESTING.md` to determine if tests are needed. Key points: +**Always reference `/testing-guidelines` when writing tests.** Key principles: -- Test error cases users will actually hit -- Add regression tests for bugs +- Mock external services, use sanitized real-world fixtures +- Prefer integration tests over unit tests +- Always add regression tests for bugs +- Cover every user entry point with at least a happy-path test - Co-locate tests with source (`foo.ts` → `foo.test.ts`) ## Task Management diff --git a/TESTING.md b/TESTING.md deleted file mode 100644 index 8a24a333..00000000 --- a/TESTING.md +++ /dev/null @@ -1,129 +0,0 @@ -# Testing Guidelines - -## Philosophy - -Tests should be **focused** and **isolated**. Every test must: - -- Run independently without affecting other tests or local state -- Use temporary directories for file operations -- Clean up resources in `afterEach` hooks - -## Running Tests - -```bash -pnpm test # Run all tests in watch mode -pnpm test:run # Run all tests once -``` - -## Test Organization - -### File Naming - -- Test files use `*.test.ts` extension -- Co-locate tests with source: `foo.ts` → `foo.test.ts` - -### Directory Structure - -``` -src/ -├── cli/ -│ ├── args.ts -│ ├── args.test.ts -│ ├── files.ts -│ └── files.test.ts -├── triggers/ -│ ├── matcher.ts -│ └── matcher.test.ts -├── output/ -│ ├── renderer.ts -│ └── renderer.test.ts -└── skills/ - ├── loader.ts - └── loader.test.ts -``` - -## Test Isolation - -### File System Tests - -Always use temporary directories for file operations: - -```typescript -import { describe, it, expect, beforeEach, afterEach } from 'vitest'; -import { mkdirSync, rmSync, writeFileSync } from 'node:fs'; -import { join } from 'node:path'; -import { tmpdir } from 'node:os'; - -describe('my feature', () => { - let tempDir: string; - - beforeEach(() => { - tempDir = join(tmpdir(), `warden-test-${Date.now()}`); - mkdirSync(tempDir, { recursive: true }); - }); - - afterEach(() => { - rmSync(tempDir, { recursive: true, force: true }); - }); - - it('does something with files', () => { - writeFileSync(join(tempDir, 'test.ts'), 'content'); - // ... test code - }); -}); -``` - -### Pure Function Tests - -For pure functions without side effects, no special setup is needed: - -```typescript -import { describe, it, expect } from 'vitest'; -import { matchGlob, shouldFail } from './matcher.js'; - -describe('matchGlob', () => { - it('matches exact paths', () => { - expect(matchGlob('src/index.ts', 'src/index.ts')).toBe(true); - }); -}); -``` - -## Coverage Depth - -Test core behavior and catch regressions—not every possible edge case. Prioritize: - -- Happy paths and common usage patterns -- Error cases users will actually hit -- Past bugs (regression tests) - -Skip: - -- Exhaustive input permutations -- Unlikely edge cases that add maintenance burden without value -- Implementation details that may change - -## Writing Good Tests - -### Do - -- Test behavior, not implementation -- Use descriptive test names that explain the scenario -- Test error cases users will realistically encounter -- Group related tests with nested `describe()` blocks -- Verify cleanup happens (no leftover files) - -### Don't - -- Share state between tests (each test should be independent) -- Depend on test execution order -- Leave unrestored mocks or spies -- Use hardcoded paths (use temp directories) - -## Coverage Goals - -Cover core functionality: - -- CLI argument parsing and file handling -- Trigger matching logic -- Skill loading and execution -- Output rendering diff --git a/warden.toml b/warden.toml index f8e5ca76..efdaaf43 100644 --- a/warden.toml +++ b/warden.toml @@ -11,3 +11,9 @@ name = "find-bugs" event = "pull_request" actions = ["opened", "synchronize", "reopened"] skill = "find-bugs" + +[[triggers]] +name = "testing-guidelines" +event = "pull_request" +actions = ["opened", "synchronize", "reopened"] +skill = "testing-guidelines"