Skip to content
Merged
89 changes: 89 additions & 0 deletions docs/configuration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,95 @@ builder.build_index("./indexes/my-notes", chunks)

`embedding_options` is persisted to the index `meta.json`, so subsequent `LeannSearcher` or `LeannChat` sessions automatically reuse the same provider settings (the embedding server manager forwards them to the provider for you).

## Optional Embedding Features

### Task-Specific Prompt Templates

Some embedding models are trained with task-specific prompts to differentiate between documents and queries. The most notable example is **Google's EmbeddingGemma**, which requires different prompts depending on the use case:

- **Indexing documents**: `"title: none | text: "`
- **Search queries**: `"task: search result | query: "`

LEANN supports automatic prompt prepending via the `--embedding-prompt-template` flag:

```bash
# Build index with EmbeddingGemma (via LM Studio or Ollama)
leann build my-docs \
--docs ./documents \
--embedding-mode openai \
--embedding-model text-embedding-embeddinggemma-300m-qat \
--embedding-api-base http://localhost:1234/v1 \
--embedding-prompt-template "title: none | text: " \
--force

# Search with query-specific prompt
leann search my-docs \
--query "What is quantum computing?" \
--embedding-prompt-template "task: search result | query: "
```

**Important Notes:**
- **Only use with compatible models**: EmbeddingGemma and similar task-specific models
- **NOT for regular models**: Adding prompts to models like `nomic-embed-text`, `text-embedding-3-small`, or `bge-base-en-v1.5` will corrupt embeddings
- **Template is saved**: Build-time templates are saved to `.meta.json` for reference
- **Flexible prompts**: You can use any prompt string, or leave it empty (`""`)

**Python API:**
```python
from leann.api import LeannBuilder

builder = LeannBuilder(
embedding_mode="openai",
embedding_model="text-embedding-embeddinggemma-300m-qat",
embedding_options={
"base_url": "http://localhost:1234/v1",
"api_key": "lm-studio",
"prompt_template": "title: none | text: ",
},
)
builder.build_index("./indexes/my-docs", chunks)
```

**References:**
- [HuggingFace Blog: EmbeddingGemma](https://huggingface.co/blog/embeddinggemma) - Technical details

### LM Studio Auto-Detection (Optional)

When using LM Studio with the OpenAI-compatible API, LEANN can optionally auto-detect model context lengths via the LM Studio SDK. This eliminates manual configuration for token limits.

**Prerequisites:**
```bash
# Install Node.js (if not already installed)
# Then install the LM Studio SDK globally
npm install -g @lmstudio/sdk
```

**How it works:**
1. LEANN detects LM Studio URLs (`:1234`, `lmstudio` in URL)
2. Queries model metadata via Node.js subprocess
3. Automatically unloads model after query (respects your JIT auto-evict settings)
4. Falls back to static registry if SDK unavailable

**No configuration needed** - it works automatically when SDK is installed:

```bash
leann build my-docs \
--docs ./documents \
--embedding-mode openai \
--embedding-model text-embedding-nomic-embed-text-v1.5 \
--embedding-api-base http://localhost:1234/v1
# Context length auto-detected if SDK available
# Falls back to registry (2048) if not
```

**Benefits:**
- ✅ Automatic token limit detection
- ✅ Respects LM Studio JIT auto-evict settings
- ✅ No manual registry maintenance
- ✅ Graceful fallback if SDK unavailable

**Note:** This is completely optional. LEANN works perfectly fine without the SDK using the built-in token limit registry.

## Index Selection: Matching Your Scale

### HNSW (Hierarchical Navigable Small World)
Expand Down
48 changes: 48 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,51 @@ You can speed up the process by using a lightweight embedding model. Add this to
--embedding-model sentence-transformers/all-MiniLM-L6-v2
```
**Model sizes:** `all-MiniLM-L6-v2` (30M parameters), `facebook/contriever` (~100M parameters), `Qwen3-0.6B` (600M parameters)

## 2. When should I use prompt templates?

**Use prompt templates ONLY with task-specific embedding models** like Google's EmbeddingGemma. These models are specially trained to use different prompts for documents vs queries.

**DO NOT use with regular models** like `nomic-embed-text`, `text-embedding-3-small`, or `bge-base-en-v1.5` - adding prompts to these models will corrupt the embeddings.

**Example usage with EmbeddingGemma:**
```bash
# Build with document prompt
leann build my-docs --embedding-prompt-template "title: none | text: "

# Search with query prompt
leann search my-docs --query "your question" --embedding-prompt-template "task: search result | query: "
```

See the [Configuration Guide: Task-Specific Prompt Templates](configuration-guide.md#task-specific-prompt-templates) for detailed usage.

## 3. Why is LM Studio loading multiple copies of my model?

This was fixed in recent versions. LEANN now properly unloads models after querying metadata, respecting your LM Studio JIT auto-evict settings.

**If you still see duplicates:**
- Update to the latest LEANN version
- Restart LM Studio to clear loaded models
- Check that you have JIT auto-evict enabled in LM Studio settings

**How it works now:**
1. LEANN loads model temporarily to get context length
2. Immediately unloads after query
3. LM Studio JIT loads model on-demand for actual embeddings
4. Auto-evicts per your settings

## 4. Do I need Node.js and @lmstudio/sdk?

**No, it's completely optional.** LEANN works perfectly fine without them using a built-in token limit registry.

**Benefits if you install it:**
- Automatic context length detection for LM Studio models
- No manual registry maintenance
- Always gets accurate token limits from the model itself

**To install (optional):**
```bash
npm install -g @lmstudio/sdk
```

See [Configuration Guide: LM Studio Auto-Detection](configuration-guide.md#lm-studio-auto-detection-optional) for details.
15 changes: 15 additions & 0 deletions packages/leann-core/src/leann/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -916,6 +916,7 @@ def search(
metadata_filters: Optional[dict[str, dict[str, Union[str, int, float, bool, list]]]] = None,
batch_size: int = 0,
use_grep: bool = False,
provider_options: Optional[dict[str, Any]] = None,
**kwargs,
) -> list[SearchResult]:
"""
Expand Down Expand Up @@ -979,10 +980,24 @@ def search(

start_time = time.time()

# Extract query template from stored embedding_options with fallback chain:
# 1. Check provider_options override (highest priority)
# 2. Check query_prompt_template (new format)
# 3. Check prompt_template (old format for backward compat)
# 4. None (no template)
query_template = None
if provider_options and "prompt_template" in provider_options:
query_template = provider_options["prompt_template"]
elif "query_prompt_template" in self.embedding_options:
query_template = self.embedding_options["query_prompt_template"]
elif "prompt_template" in self.embedding_options:
query_template = self.embedding_options["prompt_template"]

query_embedding = self.backend_impl.compute_query_embedding(
query,
use_server_if_available=recompute_embeddings,
zmq_port=zmq_port,
query_template=query_template,
)
logger.info(f" Generated embedding shape: {query_embedding.shape}")
embedding_time = time.time() - start_time
Expand Down
26 changes: 26 additions & 0 deletions packages/leann-core/src/leann/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,18 @@ def create_parser(self) -> argparse.ArgumentParser:
default=None,
help="API key for embedding service (defaults to OPENAI_API_KEY)",
)
build_parser.add_argument(
"--embedding-prompt-template",
type=str,
default=None,
help="Prompt template to prepend to all texts for embedding (e.g., 'query: ' for search)",
)
build_parser.add_argument(
"--query-prompt-template",
type=str,
default=None,
help="Prompt template for queries (different from build template for task-specific models)",
)
build_parser.add_argument(
"--force", "-f", action="store_true", help="Force rebuild existing index"
)
Expand Down Expand Up @@ -260,6 +272,12 @@ def create_parser(self) -> argparse.ArgumentParser:
action="store_true",
help="Display file paths and metadata in search results",
)
search_parser.add_argument(
"--embedding-prompt-template",
type=str,
default=None,
help="Prompt template to prepend to query for embedding (e.g., 'query: ' for search)",
)

# Ask command
ask_parser = subparsers.add_parser("ask", help="Ask questions")
Expand Down Expand Up @@ -1398,6 +1416,14 @@ async def build_index(self, args):
resolved_embedding_key = resolve_openai_api_key(args.embedding_api_key)
if resolved_embedding_key:
embedding_options["api_key"] = resolved_embedding_key
if args.query_prompt_template:
# New format: separate templates
if args.embedding_prompt_template:
embedding_options["build_prompt_template"] = args.embedding_prompt_template
embedding_options["query_prompt_template"] = args.query_prompt_template
elif args.embedding_prompt_template:
# Old format: single template (backward compat)
embedding_options["prompt_template"] = args.embedding_prompt_template

builder = LeannBuilder(
backend_name=args.backend_name,
Expand Down
Loading
Loading