Natural-language search that works like mgrep. Fast, local, and built for coding agents.
Traditional code search (grep, ripgrep, IDE search) finds exact text matches. But when you're exploring a codebase, you often think in concepts: "where is authentication handled?" or "how does the rate limiter work?"
ggrep bridges this gap:
- Semantic search: Find code by meaning, not just string matching. Ask "where do transactions get created?" and get results even if the code uses
create_txnornew_transaction. - CPU-first: Runs entirely on CPU. No GPU required, no cloud APIs, no API keys.
- 100% local: All embeddings computed locally. Your code never leaves your machine.
- Language-aware chunking: Tree-sitter parses code by function/class boundaries, so each result is a complete, meaningful unit.
- Agent-ready: Native MCP server for Claude Code, Codex CLI, Gemini CLI, and OpenCode.
Install from source:
git clone https://github.com/GoodFarming/goodgrep.git
cd goodgrep/Tools/ggrep
cargo build --releaseThe binary will be at target/release/ggrep. Add it to your PATH or run directly.
First-time setup (optional):
ggrep setupDownloads embedding models (~500MB) and tree-sitter grammars upfront. If you skip this, models download automatically on first use.
Search a codebase:
cd /path/to/your/repo
ggrep "where is authentication handled?"Your first search automatically indexes the repository. Each repository gets its own isolated index.
ggrep combines several techniques for high-quality semantic search:
-
Smart Chunking: Tree-sitter parses code by function/class boundaries, ensuring each embedding captures a complete logical block. Markdown is chunked by headings. Mermaid diagrams are preprocessed for better recall.
-
Hybrid Search: Dense embeddings (sentence-transformers) for broad semantic recall, plus ColBERT reranking for precision on top candidates.
-
Snapshot Isolation: Queries always see a consistent view of the index, never partial state during updates.
-
Background Daemon: File watcher detects changes and incrementally re-indexes. Keep
ggrep serverunning for instant searches. -
Per-Repository Isolation: Each repository gets its own index, identified by git remote URL or directory hash. Switching repos "just works".
TypeScript, TSX, JavaScript, Python, Go, Rust, C, C++, C#, Java, Kotlin, Scala, Ruby, PHP, Elixir, Haskell, OCaml, Julia, Zig, Lua, Odin, Objective-C, Verilog, HTML, CSS, XML, Markdown, JSON, YAML, TOML, Bash, Make, Starlark, HCL, Terraform, Diff, Regex
# Quick search (shorthand)
ggrep "how is the database connection pooled?"
# Full control with ggrep search
ggrep search "API rate limiting logic"
ggrep search --per-file 5 "error handling" # More results per file
ggrep search --compact "user validation" # File paths only
ggrep search --json "config parsing" # JSON output for scriptingSearch modes (bias results toward different content types):
| Flag | Mode | Best for |
|---|---|---|
-d |
Discovery | Broad exploration across code, docs, and diagrams |
-i |
Implementation | Code-focused results |
-p |
Planning | Docs and diagrams |
-b |
Debug | Debugging and incident-related code |
Output control:
| Flag | Effect |
|---|---|
-n, --no-snippet |
File + line only |
-s, --short-snippet |
Short preview |
-l, --long-snippet |
Longer preview |
-c, --content |
Full chunk content |
--compact |
File paths only (deduplicated) |
ggrep index # Index current directory
ggrep index --dry-run # Preview what would be indexed
ggrep index --reset # Delete and rebuild from scratchggrep serve # Start background daemon (file watching + fast searches)
ggrep stop # Stop daemon for current repo
ggrep stop-all # Stop all ggrep daemonsggrep status # Show daemon and index status
ggrep health # Check system health
ggrep list # List all indexed repositories
ggrep doctor # Verify models and grammars
ggrep gc # Clean up old snapshots
ggrep compact # Merge index segmentsggrep includes a built-in MCP (Model Context Protocol) server for seamless integration with coding agents.
ggrep claude-installThen open Claude Code (claude). The ggrep plugin auto-starts and provides semantic search.
ggrep codex-installggrep gemini-installggrep opencode-installggrep mcpExposes MCP tools:
search: Semantic search (returns JSON matchingggrep search --json)ggrep_status: Index and daemon statusggrep_health: System health checks
ggrep uses ~/.ggrep/config.toml for global settings. All options can also be set via GGREP_* environment variables.
# Performance
default_batch_size = 48 # Embedding batch size (auto-reduces on OOM)
max_threads = 32 # Parallel processing threads
disable_gpu = false # Force CPU even when CUDA available
# Daemon
port = 4444 # TCP port for daemon
idle_timeout_secs = 1800 # Shutdown after 30 min idleggrep respects .gitignore and also reads .ggignore files:
# .ggignore example
dist/
*.min.js
test/fixtures/
ggrep is in active development. The current release (Phase II) provides:
- Reliable snapshot isolation (queries never see partial index state)
- Crash-safe atomic updates
- Multi-daemon operation for different repositories
- Query admission control and timeouts
- Maintenance commands (gc, compact, audit, repair)
Coming in Phase III: Structured "slate" output for agents (file-grouped results with evidence), progressive disclosure, confidence-aware ranking, and MCP parity with CLI features.
goodgrep/
├── Tools/ggrep/ # Main ggrep source code and documentation
│ ├── src/ # Rust source
│ ├── Docs/ # Specs, plans, and research
│ └── tests/ # Test suites
├── Scripts/ggrep/ # Helper scripts
├── Datasets/ggrep/ # Evaluation test cases
└── README.md # This file
Requirements:
- Rust (nightly recommended for best performance)
- ~500MB disk space for models (downloaded on first use)
git clone https://github.com/GoodFarming/goodgrep.git
cd goodgrep/Tools/ggrep
# Standard build
cargo build --release
# Run tests
cargo test
# Install to cargo bin directory
cargo install --path .Optional CUDA support (for GPU acceleration):
cargo build --release --features cuda- Index feels stale? Run
ggrep indexto refresh. - Weird results? Run
ggrep doctorto verify models and grammars. - Need a fresh start? Run
ggrep index --resetor delete~/.ggrep/. - GPU OOM? Batch size auto-reduces, or set
GGREP_DISABLE_GPU=1.
ggrep is inspired by osgrep and mgrep by MixedBread.
Licensed under the Apache License, Version 2.0. See LICENSE for details.