High-performance ML experiment tracking with zero training overhead.
Part of KohakuHub - Self-hosted AI Infrastructure
pip install -e .from kohakuboard.client import Board
board = Board(name="my-experiment", config={"lr": 0.001, "batch_size": 32})
# Training loop
for epoch in range(10):
for data, target in train_loader:
loss = train_step(data, target)
board.step() # Once per optimizer step
board.log(loss=loss.item()) # Non-blocking, <0.1ms
# Alternative: move board.step() after board.log() for 0-indexed steps
# logs are stored under ./kohakuboard using KohakuVault column stores + SQLite metadata![]() |
![]() |
|---|
Join our community: https://discord.gg/xWYrkyvJ2s
- Zero Training Overhead - Non-blocking logging returns in <0.1ms
- Local-First - No server required during training, view results instantly
- High Throughput - 20,000+ metrics/second sustained
- Rich Data Types - Scalars, images, videos, tables, histograms
- WebGL Visualization - Handle 100K+ datapoints smoothly
- Self-Hosted - Your data stays on your infrastructure
Background Writer Process ensures training never waits:
Training Script Background Process
│ │
├─ board.log(loss=0.5) │
│ └─> Queue.put() │
│ (<0.1ms return!) │
│ ├─ Queue.get()
├─ Continue training... ├─ Batch write
│ └─ Flush to disk
Performance:
- Log call latency: <0.1ms
- Throughput: 20,000+ metrics/sec
- Queue capacity: 50,000 messages
- Memory overhead: ~100-200 MB
Unified API for all data types - no step inflation:
board.log(
loss=0.5, # Scalar
sample_img=Media(image), # Image
predictions=Table(results), # Table
gradients=Histogram(grads) # Histogram
)
# All logged at SAME step with 1 queue message!Supported Types:
- Scalars - Metrics, learning rates, accuracies
- Media - Images (PNG/JPG), videos (MP4), audio (WAV)
- Tables - Structured data with embedded images
- Histograms - Weight/gradient distributions with compression (99.8% size reduction)
Powered by KohakuVault - A high-performance storage library with dual interfaces over SQLite:
Three Specialized SQLite Implementations:
1. KohakuVault KVault 2. KohakuVault ColumnVault 3. Standard SQLite
(K-V Store) (Columnar Storage) (Relational)
├─ Media blobs ├─ Metrics ├─ Media metadata
├─ B+Tree index on K ├─ Histograms ├─ Tables
├─ Content-addressable ├─ Blob-based columnar └─ Step info
└─ .cache() for bulk ops └─ Dynamic chunk growth
Why KohakuVault?
- Zero dependencies - Single SQLite file, no external services
- Simple deployment - Just .db files, no infrastructure
- Dual-interface design - Dict-like for blobs, list-like for sequences
- High performance - Native speed with Pythonic API
- Memory efficient - Streaming support, dynamic chunk growth
- True SWMR - Multiple readers, single writer via SQLite WAL
Why Three Tiers?
- KVault: Optimized for blob storage with B+Tree index, content-addressable
- ColumnVault: Optimized for append-heavy time-series with columnar layout
- Standard SQLite: Optimized for structured metadata with ACID guarantees
WebGL-Based Charts powered by Plotly.js:
- Handle 100K+ datapoints smoothly
- Configurable smoothing (EMA, MA, Gaussian)
- X-axis selection (step, global_step, any metric)
- Multi-metric overlays
- Dark/light mode
- Responsive design
Rich Viewers:
- Histogram Navigator - Step-by-step distribution exploration
- Media Viewer - Image grids, video playback
- Table Viewer - Structured data with embedded images
- Dashboard - Customizable metric layouts
# Train locally
python train.py # Logs to ./kohakuboard/
# View results (no server required!)
kobo open ./kohakuboard --browser
# Optional server for team sharing (requires kohakuboard_server)
kobo-serve --port 48889No server setup, no configuration, no hassle.
pip install -e .from kohakuboard.client import Board
# Create board - automatically saves on program exit
board = Board(name="my-experiment", config={"lr": 0.001, "batch_size": 32})
# Training loop
for epoch in range(10):
for batch_idx, (data, target) in enumerate(train_loader):
loss = train_step(data, target)
# Increment step once per optimizer step (not per epoch!)
board.step()
# Log metrics (non-blocking, returns in <0.1ms)
board.log(loss=loss.item(), lr=optimizer.param_groups[0]['lr'])
# Log validation at end of epoch
val_loss = validate(model, val_loader)
board.log(**{"val/loss": val_loss})
# That's it! No .finish() needed - auto-cleanup via atexit# Local viewer (no server)
kobo open ./kohakuboard --browser
# Or launch the authenticated server (requires kohakuboard_server)
kobo-serve --port 48889
# Drop/copy board folders into the configured data dir to share runsfrom kohakuboard.client import Board, Histogram, Table, Media
import torch
# Create board with hyperparameters
board = Board(
name="cifar10-resnet18",
config={"lr": 0.001, "batch_size": 128, "epochs": 100, "optimizer": "AdamW"}
)
# Training loop
for epoch in range(100):
model.train()
for data, target in train_loader:
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
# Step once per optimizer step
board.step()
# Log scalars (non-blocking, <0.1ms)
board.log(loss=loss.item(), lr=optimizer.param_groups[0]['lr'])
# Validation
model.eval()
val_loss, correct, predictions_table = 0, 0, []
with torch.no_grad():
for batch_idx, (data, target) in enumerate(val_loader):
output = model(data)
val_loss += criterion(output, target).item()
pred = output.argmax(dim=1)
correct += (pred == target).sum().item()
# Sample predictions for table (first batch only)
if batch_idx == 0:
for i in range(min(8, len(data))):
predictions_table.append({
"image": Media(data[i].cpu().numpy()),
"true": class_names[target[i]],
"pred": class_names[pred[i]],
"correct": "✓" if pred[i] == target[i] else "✗"
})
# Log validation (scalars + table + histograms - all at same step!)
hist_data = {f"grad/{n}": Histogram(p.grad) for n, p in model.named_parameters() if p.grad is not None}
board.log(**{
"val/loss": val_loss / len(val_loader),
"val/accuracy": correct / len(val_loader.dataset),
"val/predictions": Table(predictions_table),
**hist_data
})
# No .finish() needed - automatic cleanup when script exitsMain Process (Training) Background Writer Process
│ │
├─ board.log(loss=0.5) │
│ └─> Queue.put() │
│ (returns instantly!) │
│ ├─ Queue.get()
│ ├─ Process batch
├─ Continue training... ├─ Write to storage
│ └─ Flush to disk
Key Features:
- Non-blocking:
log()returns in <0.1ms - Message Queue: 50,000 message capacity
- Writer Process: Background process drains queue
- Storage Layer: Three-tier SQLite architecture (KohakuVault KVault + ColumnVault + Standard SQLite)
- Graceful Shutdown: atexit hooks + signal handlers ensure no data loss
FastAPI Backend (Port 48889)
↓ Read-only connections
Board Files (./kohakuboard/)
├── {board_id}/
│ ├── metadata.json
│ ├── data/ ← SQL/columnar queries here
│ │ ├── metrics/ ← KohakuVault DB files
│ │ └── metadata.db ← SQLite database
│ └── media/
│ └── *.png, *.mp4
↓ REST API
Vue 3 Frontend (WebGL Charts)
Key Features:
- Zero-copy serving: Reads files directly (no database)
- Concurrent reads: Multiple connections supported
- Fast queries: Columnar storage for metrics
- Static serving: Media files served directly
./kohakuboard/
└── {board_id_timestamp}/
├── metadata.json # Board info, config, timestamps
├── data/ # Storage backend files
│ ├── metrics/ # (hybrid) KohakuVault columnar files
│ │ ├── train__loss.db
│ │ ├── val__accuracy.db
│ │ └── ...
│ ├── metadata.db # (hybrid) SQLite metadata
│ └── histograms/
│ ├── gradients_i32.db # int32 precision
│ └── params_u8.db # uint8 precision (compact)
├── media/ # Content-addressed storage
│ ├── {name}_{idx}_{step}_{sha256}.png
│ ├── {name}_{idx}_{step}_{sha256}.mp4
│ └── {name}_{idx}_{step}_{sha256}.wav
└── logs/
├── output.log # Captured stdout/stderr
└── writer.log # Writer process logs
{
"board_id": "20250129_150423_abc123",
"name": "cifar10-resnet18",
"config": {
"lr": 0.001,
"batch_size": 128,
"epochs": 100
},
"created_at": "2025-01-29T15:04:23",
"finished_at": "2025-01-29T18:32:45",
"status": "finished",
"version": "0.0.1"
}Both the training-side package (kohakuboard) and the optional server (kohakuboard_server) read the exact same directory layout. To move a run between machines:
- Copy the entire board folder (
{base_dir}/{project}/{board_id}) usingcp,rsync, or any file transfer tool. - Drop it into the destination data directory (the folder you pass to
kobo open ...or the directory configured viaKOHAKU_BOARD_DATA_DIR/--data-dironkobo-serve). - Restart the viewer or refresh the UI. The new run is immediately available.
No export/import step is required because metrics, metadata, tensors, and media already live in KohakuVault + SQLite files. The legacy kobo sync command still expects a DuckDB export and will fail on modern boards—use manual copy until the new sync API lands.
# Open local viewer (no server)
kobo open ./kohakuboard --browser
# Start authenticated server (kohakuboard_server package)
kobo-serve --port 48889 --host 0.0.0.0
# Manual sync (recommended today): copy the entire board folder into the server's data dir
# (kobo sync is still wired to the legacy DuckDB exporter and will error on modern boards)# All boards use KohakuVault + SQLite (no backend parameter needed)
board = Board(name="my-experiment", project="vision")board = Board(
name="experiment",
board_id="custom-id", # Auto-generated if not provided
config={"lr": 0.001}, # Hyperparameters
project="custom-project", # Sub-directory inside base_dir
base_dir="./my-boards", # Custom directory
capture_output=True, # Capture stdout/stderr
remote_url="https://board.example.com", # Optional future sync target
remote_token="...", # Token for remote sync (WIP)
sync_enabled=False, # Enable when remote endpoints are ready
memory_mode=False, # Keep data in RAM (requires sync to persist)
annotation="debug-run", # Suffix appended to run directory name
)Storage Architecture:
- KohakuVault KVault: Media blobs (K-V table with B+Tree index)
- KohakuVault ColumnVault: Metrics/histograms (blob-based columnar)
- Standard SQLite: Metadata (traditional relational tables)
with Board(name="experiment") as board:
board.log(loss=0.5)
# Automatic flush() and finish() on exitBoard(
name: str | None = None,
board_id: str | None = None,
config: dict | None = None,
project: str | None = None,
base_dir: str | Path | None = None,
capture_output: bool = True,
remote_url: str | None = None,
remote_token: str | None = None,
remote_project: str | None = None,
sync_enabled: bool = False,
sync_interval: int = 10,
memory_mode: bool = False,
*,
annotation: str | None = None,
)Methods:
board.step() - Increment global_step
for batch_idx, batch in enumerate(train_loader):
loss = train_step(batch)
board.step() # Increment ONCE per optimizer step
board.log(**{"train/loss": loss, "train/lr": scheduler.get_last_lr()[0]})board.log(**metrics) - Log data (non-blocking)
board.log(
loss=0.5,
accuracy=0.95,
learning_rate=0.001, # Scalars
sample=Media(image_array), # Images/video/audio
predictions=Table(rows), # Tables (optionally with Media)
grad_norm=Histogram(values), # Histograms
)
# Namespaces (creates tabs in UI)
board.log(**{
"train/loss": 0.5,
"val/accuracy": 0.95
})
# Tensor + KDE payloads (specialized viewers)
board.log(
attention_tensor=TensorLog(tensor),
density=KernelDensity(values, grid_size=256),
)board.flush() - Force flush (blocks until complete)
board.flush() # Wait for all pending writesboard.finish() - Manual cleanup (auto-called on exit)
board.finish() # Flush buffers, close connectionsfrom kobo.client.types import Media
# Images
board.log(
sample_img=Media(image_array), # numpy, PIL, torch tensor
prediction=Media(pred_img, caption="Predicted: cat")
)
# Video
board.log(
training_video=Media("output.mp4", media_type="video")
)
# Audio (if supported)
board.log(
audio_sample=Media("sample.wav", media_type="audio")
)from kobo.client.types import Table
# From list of dicts
results = Table([
{"name": "Alice", "score": 95, "pass": True},
{"name": "Bob", "score": 87, "pass": True},
])
board.log(student_results=results)
# Tables with embedded images
predictions = Table([
{"image": Media(img), "label": "cat", "confidence": 0.95},
{"image": Media(img2), "label": "dog", "confidence": 0.87},
])
board.log(val_predictions=predictions)from kobo.client.types import Histogram
# Log gradient distributions
board.log(
gradients=Histogram(param.grad),
weights=Histogram(param.data)
)
# Precompute for efficiency (optional)
hist = Histogram(gradients).compute_bins()
board.log(grad_distribution=hist)
# Compact precision (75% size reduction, ~1% accuracy loss)
hist = Histogram(gradients, precision="compact")
board.log(grad_distribution=hist)# Install
pip install -e .
# Train
python train.py
# View results
kobo open ./kohakuboard --browser# Run the authenticated server (still stabilizing)
kobo-serve --data-dir /var/kohakuboard --db sqlite:///kohakuboard.db
# Share boards by copying folders into /var/kohakuboard/<project>/
# Restart/reload the server to pick up new runsSee docs/kohakuboard/ for complete deployment guides.
| Feature | WandB | TensorBoard | MLflow | KohakuBoard |
|---|---|---|---|---|
| Latency | ~10ms | ~1ms | ~5ms | <0.1ms |
| Throughput | ~1K/sec | ~10K/sec | ~5K/sec | 20K+/sec |
| Offline | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| File-Based | ❌ No | ✅ Yes | ❌ No | ✅ Yes |
| Non-Blocking | ❌ No | ❌ No | ❌ No | ✅ Yes |
| Columnar Reads | ❌ No | ❌ No | ✅ Yes | ✅ Yes (KohakuVault ColumnVault) |
| WebGL Charts | ❌ No | ❌ No | ❌ No | ✅ Yes |
| 100K+ Points | Slow | Slow | Slow | Fast |
| Self-Hosted | Limited | ✅ Yes | ✅ Yes | ✅ Yes |
| Setup | Cloud | Local | Server | None |
- Getting Started - Installation and basic usage
- API Reference - Complete API documentation
- Architecture - System design and internals
- CLI Guide - Command-line tool usage
- Usage Manual - Day-to-day operations checklist
- Examples - Real-world usage patterns
See examples/ directory:
kohakuboard_basic.py- Simple scalar loggingkohakuboard_all_media_types.py- Images, videos, tableskohakuboard_cifar_training.py- Complete CIFAR-10 training examplekohakuboard_media_in_tables.py- Tables with embedded imageskohakuboard_histogram_logging.py- Gradient distribution tracking
Client Library:
- Non-blocking logging architecture
- Rich data types (scalars, media, tables, histograms)
- Three-tier SQLite architecture (KohakuVault KVault + ColumnVault + Standard SQLite)
- Graceful shutdown with queue draining
- Content-addressed media storage
Backend & UI:
- FastAPI REST API
- Vue 3 interface with dark/light mode
- WebGL charts (100K+ points)
- Histogram navigator
- Media/table viewers
- CLI tool (
kobo)
- Remote server mode with authentication
- Sync protocol for uploading local boards
- Project management (group related boards)
- Run comparison UI (side-by-side metrics)
- Real-time streaming (live updates while training)
Client Features:
- PyTorch Lightning integration
- Keras callback
- Hugging Face Trainer integration
- Custom callback system
Backend Features:
- Multi-board comparison API
- Advanced filtering (tags, date range)
- Export to CSV/JSON
- Aggregation queries
UI Features:
- Diff viewer (compare runs)
- Scatter plots (metric vs metric)
- Custom dashboards
- Annotations
- Search and filter
Infrastructure:
- Docker/Kubernetes deployment
- Cloud storage backends (S3, GCS)
- Multi-user authentication
KohakuBoard is a multi-component project with different licenses:
-
Client Library (
kohakuboard): Apache License 2.0- Free for commercial and non-commercial use
- Permissive license with minimal requirements
-
Web UI (
kohaku-board-ui): AGPL-3.0- Free to use and modify
- Source code disclosure required for network services
-
Server (
kohakuboard_server): Kohaku Software License 1.0- Free for non-commercial use
- Free for commercial use under revenue/duration limits
- Commercial licenses available for larger deployments
Commercial Licensing: For commercial licenses or exemptions, contact kohaku@kblueleaf.net
See LICENSE for complete details.
KohakuBoard is part of the KohakuHub ecosystem. We welcome contributions!
Before contributing:
- Read CONTRIBUTING.md for code style and guidelines
- Join our Discord for discussions
- Check open issues tagged with
kohakuboard
Areas we need help:
- 🎨 Frontend (chart improvements, UI/UX)
- 🔧 Backend (storage backends, performance)
- 📊 Client library (framework integrations)
- 📚 Documentation (tutorials, guides)
- 🧪 Testing (unit tests, benchmarks)
- Discord: https://discord.gg/xWYrkyvJ2s (Use #kohakuboard channel)
- Issues: https://github.com/KohakuBlueleaf/KohakuHub/issues (Tag with
kohakuboard) - Email: kohaku@kblueleaf.net
- KohakuVault - High-performance storage library with dual SQLite interfaces (KVault for blobs, ColumnVault for sequences)
- Plotly.js - WebGL charts
- Vue 3 - Modern UI framework
- FastAPI - Backend framework
Production Ready! Core features are stable and performant. Use in real training workflows and help us improve.

