Skip to content

PairOfCleats builds a hybrid semantic index for a repo (code + docs) and exposes a CLI/MCP server for fast, filterable search. It is designed for agent workflows, with artifacts stored outside the repo by default so they can be shared across runs, containers, and CI while keeping working trees clean.

Notifications You must be signed in to change notification settings

doublemover/PairOfCleats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

839 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PairOfCleats

Local-first hybrid indexing and retrieval for source repositories.

PairOfCleats builds deterministic index artifacts for code and prose, then runs mixed sparse + dense retrieval with strict contracts around artifacts, schemas, and cache identity.

Runtime Requirements

Hard requirements:

  • Node.js >=24.13.0 (.nvmrc is 24.13.0)
  • npm (normal dependency install; scripts enabled)

Important install requirement:

  • Source-checkout installs are expected to include dev dependencies so required patches can be applied.
  • npm ci --omit=dev / production-only installs can fail in this repo when required patches/*.patch files are present.

Optional capabilities:

  • Python 3 (for Python-related tooling/tests and optional AST paths)
  • sqlite-vec extension (faster ANN path when available)
  • LMDB / LanceDB / HNSW backends (selected by policy and capability)
  • PDF/DOCX extraction dependencies (capability-gated document extraction flows)

What It Provides

  • CLI: pairofcleats <command>
  • HTTP API: pairofcleats service api
  • Indexer service worker: pairofcleats service indexer
  • MCP server mode via tooling scripts (npm run mcp-server)

Primary CLI surface:

  • setup
  • bootstrap
  • index build
  • index watch
  • index validate
  • search
  • lmdb build

Quick Start

Install:

npm install

Guided setup (recommended):

pairofcleats setup

Non-interactive bootstrap:

pairofcleats bootstrap

Build and validate:

pairofcleats index build --mode all --quality balanced
pairofcleats index validate

Search:

pairofcleats search -- "where is query cache invalidated?" --mode code
pairofcleats search -- "release matrix and packaging" --mode prose --explain --json

Start API server:

pairofcleats service api

Mental Model

PairOfCleats is a two-plane system:

  • Build plane: deterministic artifact production
  • Retrieval plane: query planning, candidate generation, scoring, and output shaping

Core data model:

  • Repo identity -> cache root -> build root -> per-mode index roots
  • Modes: code, prose, extracted-prose, records
  • Contract-first artifacts with manifest-first loading

High-level flow:

Repo files
  -> discovery + mode classification
  -> chunking + metadata + postings + relations
  -> artifact pieces + manifest + build_state
  -> optional sqlite/ann materialization
  -> builds/current.json promotion

Query
  -> parse + plan + intent
  -> candidate prefilter
  -> sparse rank (BM25 / sqlite-fts)
  -> dense rank (ann providers)
  -> fusion + boosts + explain
  -> stable output (human or json)

Build Pipeline (Technical)

  1. Runtime envelope:
  • config resolution + policy normalization
  • concurrency and capability resolution
  1. Discovery and classification:
  • ignore rules + file caps
  • deterministic mode assignment
  1. Foreground indexing:
  • chunk extraction and metadata
  • sparse artifacts (postings/chargrams/filter index)
  • per-mode artifact writing with manifest entries
  1. Background enrichment:
  • tree-sitter/lint/risk/embeddings (policy-gated)
  • optional ANN materialization paths
  1. Promotion:
  • validation gate
  • builds/current.json update only after successful build

Retrieval Pipeline (Technical)

  1. Query parse and routing:
  • query-plan construction
  • mode-aware tokenization and routing
  1. Candidate generation:
  • filter index and chargram prefilter for path/file constraints
  • backend/provider availability checks
  1. Ranking:
  • sparse ranking (bm25 or sqlite-fts)
  • dense ranking (ann providers based on capability/policy)
  1. Fusion and output:
  • RRF or blend policy
  • deterministic tie-breaking
  • optional --explain score breakdown and pipeline stats
  1. Cache behavior:
  • query cache keys include retrieval-relevant knobs and index identity
  • strict manifest-first index loading by default

Artifact and Cache Layout

Default cache layout is outside the repository:

  • <cacheRoot>/repos/<repoId>/builds/<buildId>/index-code
  • <cacheRoot>/repos/<repoId>/builds/<buildId>/index-prose
  • <cacheRoot>/repos/<repoId>/builds/<buildId>/index-extracted-prose
  • <cacheRoot>/repos/<repoId>/builds/<buildId>/index-records
  • <cacheRoot>/repos/<repoId>/builds/current.json

Set custom cache root in .pairofcleats.json:

{
  "cache": {
    "root": "C:/absolute/path/to/cache"
  }
}

Query Notes

Core syntax:

  • "exact phrase"
  • -term
  • -"excluded phrase"

Mode flags:

  • --mode code
  • --mode prose
  • --mode extracted-prose
  • --mode records
  • --mode all

Diagnostics:

  • --explain for ranking/routing details
  • --stats for pipeline timing and memory checkpoints
  • --json for machine-readable output

Testing and CI Lanes

Run a lane:

node tests/run.js --lane ci-lite
node tests/run.js --lane ci
node tests/run.js --lane ci-long
node tests/run.js --lane gate

Run with parallel jobs and timing outputs:

node tests/run.js --lane ci-long --jobs 4 --log-times .testLogs/ci-long-testRunTimes.txt --timings-file .testLogs/ci-long-timings.json

List lanes/tags:

node tests/run.js --list-lanes
node tests/run.js --list-tags

Learn More

Architecture and pipelines:

Contracts and schemas:

SQLite and ANN:

Setup, service, and integrations:

Advanced roadmap features and specs:

Testing and reliability:

License

License not yet specified in this repository.

About

PairOfCleats builds a hybrid semantic index for a repo (code + docs) and exposes a CLI/MCP server for fast, filterable search. It is designed for agent workflows, with artifacts stored outside the repo by default so they can be shared across runs, containers, and CI while keeping working trees clean.

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages