Semantic search for Swedish KPIs from Kolada, using vector embeddings.
Requires uv package manager.
uv sync # Install dependencies
uv run python -m kpisearch.sync # Download KPIs + build embeddings
uv run python -m kpisearch.search build-all # Build embeddings for all models
uv run uvicorn kpisearch.app:app # Start serverOpen http://localhost:8000 in your browser.
The sync command is the easiest way to get started. It fetches KPIs from the Kolada API, detects changes using content hashes, and only recomputes embeddings for what changed.
uv run python -m kpisearch.syncFor a full rebuild instead (e.g. after switching models):
uv run python -m kpisearch.download_kpis # Re-download all KPIs
uv run python -m kpisearch.search build # Rebuild embeddings for current model
uv run python -m kpisearch.search build-all # Rebuild embeddings for all modelsThe frontend offers three search methods (toggleable via checkboxes):
| Method | How it works |
|---|---|
| Semantisk | Pure vector similarity search |
| Hybrid | Semantic search with additive keyword boost for title matches |
| Kolada API | Proxied title search via Kolada's own API |
The hybrid search scores each KPI in three stages:
-
Semantic similarity — The query is embedded and compared against pre-computed title and description embeddings using cosine similarity. The two scores are combined with a configurable weight (default 60% title, 40% description):
semantic = title_weight * title_sim + (1 - title_weight) * desc_sim -
Keyword boost — Each query word is checked for literal substring presence in the KPI title (case-insensitive). The boost is proportional to the fraction of query words matched, scaled by a brevity factor that favors shorter titles (where the query covers more of the title):
match_ratio = matched_words / total_words brevity = clamp(query_char_len / title_char_len, 0, 1) keyword_boost = match_ratio * (1 + 0.5 * brevity) -
Standard KPI bonus — KPIs with IDs starting with
N(Kolada's standard/national indicators) get a 15% multiplicative boost to the combined score:score = (semantic + 0.25 * keyword_boost) * (1 + 0.15 * is_standard)
The top-k results are returned sorted by final score.
Semantic search. Parameters: q (required), limit (default 10), min_score (default 0.4), title_weight (0-1).
Hybrid search. Parameters: q (required), limit (default 10), title_weight (0-1).
Proxied Kolada API search. Parameters: q (required), limit (default 15).
curl "http://localhost:8000/api/search?q=skolresultat&limit=5"Three embedding models, switchable at runtime via the admin panel:
| Model | Notes |
|---|---|
| KBLab/sentence-bert-swedish-cased | Swedish-specific (default) |
| intfloat/multilingual-e5-small | High-quality multilingual |
| sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | Lightweight multilingual |
Admin panel at /admin, protected with HTTP Basic Auth.
Default password: change_this_now_really! (forced change on first login).
uv run python -m kpisearch.auth set <new-password> # Set password
uv run python -m kpisearch.auth reset # Reset to defaultuv run uvicorn kpisearch.app:app --reload # Dev server with auto-reload
uv run ruff check kpisearch/ # Lint
uv run ruff format kpisearch/ # Format
uv run ty check # Type check