Local-first hybrid indexing and retrieval for source repositories.
PairOfCleats builds deterministic index artifacts for code and prose, then runs mixed sparse + dense retrieval with strict contracts around artifacts, schemas, and cache identity.
Hard requirements:
- Node.js
>=24.13.0(.nvmrcis24.13.0) - npm (normal dependency install; scripts enabled)
Important install requirement:
- Source-checkout installs are expected to include dev dependencies so required patches can be applied.
npm ci --omit=dev/ production-only installs can fail in this repo when requiredpatches/*.patchfiles are present.
Optional capabilities:
- Python 3 (for Python-related tooling/tests and optional AST paths)
- sqlite-vec extension (faster ANN path when available)
- LMDB / LanceDB / HNSW backends (selected by policy and capability)
- PDF/DOCX extraction dependencies (capability-gated document extraction flows)
- CLI:
pairofcleats <command> - HTTP API:
pairofcleats service api - Indexer service worker:
pairofcleats service indexer - MCP server mode via tooling scripts (
npm run mcp-server)
Primary CLI surface:
setupbootstrapindex buildindex watchindex validatesearchlmdb build
Install:
npm installGuided setup (recommended):
pairofcleats setupNon-interactive bootstrap:
pairofcleats bootstrapBuild and validate:
pairofcleats index build --mode all --quality balanced
pairofcleats index validateSearch:
pairofcleats search -- "where is query cache invalidated?" --mode code
pairofcleats search -- "release matrix and packaging" --mode prose --explain --jsonStart API server:
pairofcleats service apiPairOfCleats is a two-plane system:
- Build plane: deterministic artifact production
- Retrieval plane: query planning, candidate generation, scoring, and output shaping
Core data model:
- Repo identity -> cache root -> build root -> per-mode index roots
- Modes:
code,prose,extracted-prose,records - Contract-first artifacts with manifest-first loading
High-level flow:
Repo files
-> discovery + mode classification
-> chunking + metadata + postings + relations
-> artifact pieces + manifest + build_state
-> optional sqlite/ann materialization
-> builds/current.json promotion
Query
-> parse + plan + intent
-> candidate prefilter
-> sparse rank (BM25 / sqlite-fts)
-> dense rank (ann providers)
-> fusion + boosts + explain
-> stable output (human or json)
- Runtime envelope:
- config resolution + policy normalization
- concurrency and capability resolution
- Discovery and classification:
- ignore rules + file caps
- deterministic mode assignment
- Foreground indexing:
- chunk extraction and metadata
- sparse artifacts (postings/chargrams/filter index)
- per-mode artifact writing with manifest entries
- Background enrichment:
- tree-sitter/lint/risk/embeddings (policy-gated)
- optional ANN materialization paths
- Promotion:
- validation gate
builds/current.jsonupdate only after successful build
- Query parse and routing:
- query-plan construction
- mode-aware tokenization and routing
- Candidate generation:
- filter index and chargram prefilter for path/file constraints
- backend/provider availability checks
- Ranking:
- sparse ranking (
bm25orsqlite-fts) - dense ranking (ann providers based on capability/policy)
- Fusion and output:
- RRF or blend policy
- deterministic tie-breaking
- optional
--explainscore breakdown and pipeline stats
- Cache behavior:
- query cache keys include retrieval-relevant knobs and index identity
- strict manifest-first index loading by default
Default cache layout is outside the repository:
<cacheRoot>/repos/<repoId>/builds/<buildId>/index-code<cacheRoot>/repos/<repoId>/builds/<buildId>/index-prose<cacheRoot>/repos/<repoId>/builds/<buildId>/index-extracted-prose<cacheRoot>/repos/<repoId>/builds/<buildId>/index-records<cacheRoot>/repos/<repoId>/builds/current.json
Set custom cache root in .pairofcleats.json:
{
"cache": {
"root": "C:/absolute/path/to/cache"
}
}Core syntax:
"exact phrase"-term-"excluded phrase"
Mode flags:
--mode code--mode prose--mode extracted-prose--mode records--mode all
Diagnostics:
--explainfor ranking/routing details--statsfor pipeline timing and memory checkpoints--jsonfor machine-readable output
Run a lane:
node tests/run.js --lane ci-lite
node tests/run.js --lane ci
node tests/run.js --lane ci-long
node tests/run.js --lane gateRun with parallel jobs and timing outputs:
node tests/run.js --lane ci-long --jobs 4 --log-times .testLogs/ci-long-testRunTimes.txt --timings-file .testLogs/ci-long-timings.jsonList lanes/tags:
node tests/run.js --list-lanes
node tests/run.js --list-tagsArchitecture and pipelines:
docs/guides/architecture.mddocs/guides/search.mddocs/perf/retrieval-pipeline.mddocs/perf/index-artifact-pipelines.md
Contracts and schemas:
docs/contracts/indexing.mddocs/contracts/search-contract.mddocs/contracts/artifact-contract.mddocs/contracts/search-cli.mddocs/config/schema.jsondocs/config/contract.md
SQLite and ANN:
Setup, service, and integrations:
docs/guides/setup.mddocs/guides/service-mode.mddocs/api/server.mddocs/api/mcp-server.mddocs/guides/mcp.mddocs/guides/editor-integration.md
Advanced roadmap features and specs:
docs/specs/index-refs-and-snapshots.mddocs/specs/index-diffs.mddocs/specs/federated-search.mddocs/specs/workspace-config.mddocs/specs/workspace-manifest.mddocs/specs/progress-protocol-v2.mddocs/specs/node-supervisor-protocol.md
Testing and reliability:
docs/testing/test-runner-interface.mddocs/testing/truth-table.mddocs/testing/ci-capability-policy.md
License not yet specified in this repository.
