Skip to content

oengwall/kpisearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KPI Search

Semantic search for Swedish KPIs from Kolada, using vector embeddings.

Quick Start

Requires uv package manager.

uv sync                                       # Install dependencies
uv run python -m kpisearch.sync               # Download KPIs + build embeddings
uv run python -m kpisearch.search build-all   # Build embeddings for all models
uv run uvicorn kpisearch.app:app              # Start server

Open http://localhost:8000 in your browser.

Data Pipeline

The sync command is the easiest way to get started. It fetches KPIs from the Kolada API, detects changes using content hashes, and only recomputes embeddings for what changed.

uv run python -m kpisearch.sync

For a full rebuild instead (e.g. after switching models):

uv run python -m kpisearch.download_kpis     # Re-download all KPIs
uv run python -m kpisearch.search build       # Rebuild embeddings for current model
uv run python -m kpisearch.search build-all   # Rebuild embeddings for all models

Search Methods

The frontend offers three search methods (toggleable via checkboxes):

Method How it works
Semantisk Pure vector similarity search
Hybrid Semantic search with additive keyword boost for title matches
Kolada API Proxied title search via Kolada's own API

Hybrid Algorithm

The hybrid search scores each KPI in three stages:

  1. Semantic similarity — The query is embedded and compared against pre-computed title and description embeddings using cosine similarity. The two scores are combined with a configurable weight (default 60% title, 40% description):

    semantic = title_weight * title_sim + (1 - title_weight) * desc_sim
    
  2. Keyword boost — Each query word is checked for literal substring presence in the KPI title (case-insensitive). The boost is proportional to the fraction of query words matched, scaled by a brevity factor that favors shorter titles (where the query covers more of the title):

    match_ratio  = matched_words / total_words
    brevity      = clamp(query_char_len / title_char_len, 0, 1)
    keyword_boost = match_ratio * (1 + 0.5 * brevity)
    
  3. Standard KPI bonus — KPIs with IDs starting with N (Kolada's standard/national indicators) get a 15% multiplicative boost to the combined score:

    score = (semantic + 0.25 * keyword_boost) * (1 + 0.15 * is_standard)
    

The top-k results are returned sorted by final score.

API

GET /api/search

Semantic search. Parameters: q (required), limit (default 10), min_score (default 0.4), title_weight (0-1).

GET /api/hybrid-search

Hybrid search. Parameters: q (required), limit (default 10), title_weight (0-1).

GET /api/kolada-search

Proxied Kolada API search. Parameters: q (required), limit (default 15).

curl "http://localhost:8000/api/search?q=skolresultat&limit=5"

Models

Three embedding models, switchable at runtime via the admin panel:

Model Notes
KBLab/sentence-bert-swedish-cased Swedish-specific (default)
intfloat/multilingual-e5-small High-quality multilingual
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Lightweight multilingual

Admin

Admin panel at /admin, protected with HTTP Basic Auth.

Default password: change_this_now_really! (forced change on first login).

uv run python -m kpisearch.auth set <new-password>   # Set password
uv run python -m kpisearch.auth reset                 # Reset to default

Development

uv run uvicorn kpisearch.app:app --reload   # Dev server with auto-reload
uv run ruff check kpisearch/                # Lint
uv run ruff format kpisearch/               # Format
uv run ty check                             # Type check

About

Kpi search via vector embedding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •