GitHub - GoodFarming/goodgrep: Adaptive Semantic Search

Natural-language search that works like mgrep. Fast, local, and built for coding agents.

Why ggrep?

Traditional code search (grep, ripgrep, IDE search) finds exact text matches. But when you're exploring a codebase, you often think in concepts: "where is authentication handled?" or "how does the rate limiter work?"

ggrep bridges this gap:

Semantic search: Find code by meaning, not just string matching. Ask "where do transactions get created?" and get results even if the code uses create_txn or new_transaction.
CPU-first: Runs entirely on CPU. No GPU required, no cloud APIs, no API keys.
100% local: All embeddings computed locally. Your code never leaves your machine.
Language-aware chunking: Tree-sitter parses code by function/class boundaries, so each result is a complete, meaningful unit.
Agent-ready: Native MCP server for Claude Code, Codex CLI, Gemini CLI, and OpenCode.

Quick Start

Install from source:

git clone https://github.com/GoodFarming/goodgrep.git
cd goodgrep/Tools/ggrep
cargo build --release

The binary will be at target/release/ggrep. Add it to your PATH or run directly.

First-time setup (optional):

ggrep setup

Downloads embedding models (~500MB) and tree-sitter grammars upfront. If you skip this, models download automatically on first use.

Search a codebase:

cd /path/to/your/repo
ggrep "where is authentication handled?"

Your first search automatically indexes the repository. Each repository gets its own isolated index.

How It Works

ggrep combines several techniques for high-quality semantic search:

Smart Chunking: Tree-sitter parses code by function/class boundaries, ensuring each embedding captures a complete logical block. Markdown is chunked by headings. Mermaid diagrams are preprocessed for better recall.
Hybrid Search: Dense embeddings (sentence-transformers) for broad semantic recall, plus ColBERT reranking for precision on top candidates.
Snapshot Isolation: Queries always see a consistent view of the index, never partial state during updates.
Background Daemon: File watcher detects changes and incrementally re-indexes. Keep ggrep serve running for instant searches.
Per-Repository Isolation: Each repository gets its own index, identified by git remote URL or directory hash. Switching repos "just works".

Supported Languages (37)

TypeScript, TSX, JavaScript, Python, Go, Rust, C, C++, C#, Java, Kotlin, Scala, Ruby, PHP, Elixir, Haskell, OCaml, Julia, Zig, Lua, Odin, Objective-C, Verilog, HTML, CSS, XML, Markdown, JSON, YAML, TOML, Bash, Make, Starlark, HCL, Terraform, Diff, Regex

Commands

Search

# Quick search (shorthand)
ggrep "how is the database connection pooled?"

# Full control with ggrep search
ggrep search "API rate limiting logic"
ggrep search --per-file 5 "error handling"      # More results per file
ggrep search --compact "user validation"         # File paths only
ggrep search --json "config parsing"             # JSON output for scripting

Search modes (bias results toward different content types):

Flag	Mode	Best for
`-d`	Discovery	Broad exploration across code, docs, and diagrams
`-i`	Implementation	Code-focused results
`-p`	Planning	Docs and diagrams
`-b`	Debug	Debugging and incident-related code

Output control:

Flag	Effect
`-n`, `--no-snippet`	File + line only
`-s`, `--short-snippet`	Short preview
`-l`, `--long-snippet`	Longer preview
`-c`, `--content`	Full chunk content
`--compact`	File paths only (deduplicated)

Indexing

ggrep index              # Index current directory
ggrep index --dry-run    # Preview what would be indexed
ggrep index --reset      # Delete and rebuild from scratch

Daemon

ggrep serve              # Start background daemon (file watching + fast searches)
ggrep stop               # Stop daemon for current repo
ggrep stop-all           # Stop all ggrep daemons

Status and Maintenance

ggrep status             # Show daemon and index status
ggrep health             # Check system health
ggrep list               # List all indexed repositories
ggrep doctor             # Verify models and grammars
ggrep gc                 # Clean up old snapshots
ggrep compact            # Merge index segments

AI Agent Integration

ggrep includes a built-in MCP (Model Context Protocol) server for seamless integration with coding agents.

Claude Code

ggrep claude-install

Then open Claude Code (claude). The ggrep plugin auto-starts and provides semantic search.

Codex CLI

ggrep codex-install

Gemini CLI

ggrep gemini-install

OpenCode

ggrep opencode-install

MCP Server (Manual)

ggrep mcp

Exposes MCP tools:

search: Semantic search (returns JSON matching ggrep search --json)
ggrep_status: Index and daemon status
ggrep_health: System health checks

Configuration

ggrep uses ~/.ggrep/config.toml for global settings. All options can also be set via GGREP_* environment variables.

Key Options

# Performance
default_batch_size = 48      # Embedding batch size (auto-reduces on OOM)
max_threads = 32             # Parallel processing threads
disable_gpu = false          # Force CPU even when CUDA available

# Daemon
port = 4444                  # TCP port for daemon
idle_timeout_secs = 1800     # Shutdown after 30 min idle

File Ignoring

ggrep respects .gitignore and also reads .ggignore files:

# .ggignore example
dist/
*.min.js
test/fixtures/

Project Status

ggrep is in active development. The current release (Phase II) provides:

Reliable snapshot isolation (queries never see partial index state)
Crash-safe atomic updates
Multi-daemon operation for different repositories
Query admission control and timeouts
Maintenance commands (gc, compact, audit, repair)

Coming in Phase III: Structured "slate" output for agents (file-grouped results with evidence), progressive disclosure, confidence-aware ranking, and MCP parity with CLI features.

Repository Structure

goodgrep/
├── Tools/ggrep/           # Main ggrep source code and documentation
│   ├── src/               # Rust source
│   ├── Docs/              # Specs, plans, and research
│   └── tests/             # Test suites
├── Scripts/ggrep/         # Helper scripts
├── Datasets/ggrep/        # Evaluation test cases
└── README.md              # This file

Building from Source

Requirements:

Rust (nightly recommended for best performance)
~500MB disk space for models (downloaded on first use)

git clone https://github.com/GoodFarming/goodgrep.git
cd goodgrep/Tools/ggrep

# Standard build
cargo build --release

# Run tests
cargo test

# Install to cargo bin directory
cargo install --path .

Optional CUDA support (for GPU acceleration):

cargo build --release --features cuda

Troubleshooting

Index feels stale? Run ggrep index to refresh.
Weird results? Run ggrep doctor to verify models and grammars.
Need a fresh start? Run ggrep index --reset or delete ~/.ggrep/.
GPU OOM? Batch size auto-reduces, or set GGREP_DISABLE_GPU=1.

Acknowledgments

ggrep is inspired by osgrep and mgrep by MixedBread.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why ggrep?

Quick Start

How It Works

Supported Languages (37)

Commands

Search

Indexing

Daemon

Status and Maintenance

AI Agent Integration

Claude Code

Codex CLI

Gemini CLI

OpenCode

MCP Server (Manual)

Configuration

Key Options

File Ignoring

Project Status

Repository Structure

Building from Source

Troubleshooting

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Datasets/ggrep		Datasets/ggrep
Scripts		Scripts
Tools/ggrep		Tools/ggrep
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
MIGRATION_PLAN.md		MIGRATION_PLAN.md
Makefile		Makefile
README.md		README.md

License

GoodFarming/goodgrep

Folders and files

Latest commit

History

Repository files navigation

Why ggrep?

Quick Start

How It Works

Supported Languages (37)

Commands

Search

Indexing

Daemon

Status and Maintenance

AI Agent Integration

Claude Code

Codex CLI

Gemini CLI

OpenCode

MCP Server (Manual)

Configuration

Key Options

File Ignoring

Project Status

Repository Structure

Building from Source

Troubleshooting

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages