Monorepo Guide

Scope

Shared infrastructure for adversarial interpretability across competitive games (chess, diplomacy, etc.).
Common evals, visualization, data reports, model wrappers, and commands to run training jobs

Non-goals

Forcing similar model architectures or training code across all experiments.
Over-abstracting before concrete use cases exist.
Standardised analysis for all experiements - just want some consistency in final presentation (plots/tables)

Directory layout

docs/
environments/
- chess_probe/
libs/
- evals/
- visualization/
configs/
- examples/
scripts/

Shared libraries

evals/: engine-eval delta, Elo, deception metrics (PR/recall), cost tracking.
visualization/: plotting helpers and experiment dashboards.
probes/: soft-token and residual injection modules with small, clear APIs.
engines/: thin wrappers for Stockfish/Lc0 or other evaluators.

Runners

TRL PPO (single-turn for chess-probe) with probe-only optimization.
Verifier- or agent-tooling adapters for multi-step environments.

Results and experiment tracking

Location: write all outputs under results/<env>/<experiment_name>/<YYYYMMDD_HHMMSS-<run_id>>/.
- Example: results/chess_probe/probe_ablation/20250115_142530-a1b2c3/
Contents inside a run directory:
- config.yaml (or .toml): exact configuration used for the run (copied from --config or auto-dumped resolved config)
- metadata.json: immutable run metadata
  - git commit, branch, dirty flag; user, host; Python/CUDA versions; random seeds
  - full invocation (command, args, PYTHONPATH), environment name, library versions (optionally pip freeze)
- logs/: captured stdout/stderr/wandb
- plots/: generated figures for quick inspection
- artifacts/: model/probe checkpoints and large outputs (consider symlinks or pointer files if we need to store stuff elsewhere)
- samples/: qualitative samples (games, traces, prompts/responses)
- metrics/: summary metrics from experiment
Script conventions (strongly recommended):
- --config path/to/config.yaml and --experiment-name <slug>
- --output-dir results/ (default) so scripts create the full run path automatically
- --notes "short freeform note" saved in metadata.json
- On startup: create the run directory, copy the config, write metadata.json
- During training/eval: append metrics to metrics.jsonl, write plots and artifacts under the run directory
Remote trackers: optionally mirror metrics to W&B or MLflow, but the filesystem record above is the source of truth for reproducibility.

Index and discovery

An append-only index is maintained at results/index/runs_index.jsonl for fast discovery.
New runs are auto-indexed:
- On entry via gamescope.libs.run_utils.capture_metadata() or the run_context(...) context manager (writes a start event)
- On exit via gamescope.libs.run_utils.mark_status() (writes an end event with exit reason)
Artifact usage can be logged to surface interesting runs:
- Call gamescope.libs.run_utils.mark_artifact_used(path_to_artifact, reason="...")
- This writes <run_dir>/artifacts/USED_BY.jsonl and an artifact_used event in the index

CLI helpers

List runs (non-junk by default, grouped by script, newest first; includes duration and usage counts):

uv run python scripts/find_run.py --results-root results

Backfill the index for existing runs:

uv run python scripts/reindex_runs.py --results-root results

Config runner

Run any experiment from a YAML file; a fresh run directory is created and the full config is recorded.

uv run python scripts/config_runner.py --config configs/examples/my_eval.yaml

YAML shape:

command: environments/chess_probe/scripts/eval_qwen_bc.py
args:
  model_name_or_path: Qwen/Qwen3-8B-Base
  num_eval_data: 200
  results_dir: results/chess_probe
  save_jsonl: true

The runner injects run_dir for downstream scripts (available as --run_dir if supported, otherwise in env as RUN_DIR).

Add a new environment

Create environments/<env_name>/ with a README.md describing assumptions and dependencies.
Reuse libs/ components where possible; avoid environment-specific logic in libs/ (or, if you need evals and they'd be relevant to multiple experiments, create them in libs/).
Provide example configs under configs/examples/ to run your experiments.
Add/modify scripts under scripts/ to run your experiment and collect results.

Licensing

Preserve third-party licenses and headers. See THIRD_PARTY_NOTICES.md.

Setup with uv

Install uv (Linux/macOS):

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

This creates a local virtual environment (e.g., .venv/) and installs the base project dependencies.

Run scripts using the synced environment:

uv run python scripts/your_script.py --help

If you define optional extras for your environment, include them at run time:

uv run --with '.[your_extra]' python scripts/your_script.py ...

Notes

uv sync is only needed after changing dependencies or on first setup. For ephemeral runs without a full sync, you may also use uv run which will resolve and execute in a temporary environment.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
configs		configs
gamescope		gamescope
index		index
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
connectivity_graph.png		connectivity_graph.png
probe.pkl		probe.pkl
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Monorepo Guide

Config runner

Setup with uv

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

EleutherAI/gamescope

Folders and files

Latest commit

History

Repository files navigation

Monorepo Guide

Config runner

Setup with uv

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages