Skip to content

[wip][clustering] visualization dashboard#170

Draft
mivanit wants to merge 705 commits intoclustering/mainfrom
feature/clustering-dashboard
Draft

[wip][clustering] visualization dashboard#170
mivanit wants to merge 705 commits intoclustering/mainfrom
feature/clustering-dashboard

Conversation

@mivanit
Copy link
Collaborator

@mivanit mivanit commented Sep 25, 2025

Description

wip, marked as "ready for review" for getting CI tests to run

see #198

Related Issue

#133

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

@mivanit mivanit marked this pull request as draft September 25, 2025 13:41
@mivanit mivanit mentioned this pull request Sep 30, 2025
28 tasks
@mivanit mivanit changed the base branch from main to feature/clustering October 6, 2025 09:30
mivanit and others added 27 commits October 6, 2025 14:02
@mivanit mivanit marked this pull request as ready for review October 24, 2025 16:48
@mivanit mivanit changed the title [clustering] visualization dashboard [wip][clustering] visualization dashboard Oct 24, 2025
see discussion at #227 (comment)

have tried to make this change as isolated as possible -- i think this was a useful feature and we may want to add it back at some point
old name didnt make sense, since it should be a path to a file with a `ClusteringRunConfig`

see #227 (comment)
@miv-goodfire
Copy link

miv-goodfire commented Oct 24, 2025

commenting because @mivanit has reduced perms on the repo so CI won't run. hopefully interacting will enable them to run?

EDIT: commenting wont work. committing to the PR branch will though

@mivanit mivanit marked this pull request as draft October 27, 2025 11:29
danbraunai-goodfire added a commit that referenced this pull request Dec 23, 2025
* clustering (#43) squash

Squashing #43 to a clean branch for clustering work

#43 (comment)


# Commits

* refactor to use MergeRunConfig everywhere

* Wip

* wip

* wip

* some info to yaml

* todos

* LFGgsgsgs

* wip

* refactor configs. new system with ability to choose sampling mechanism. code will be pushed later

* refactor makefile to use new configs

* add tests. code still not comitted yet

* format

* fix tests

* test configs

* TESTS PASSIN LFGGGG

* format

* pyright fixes

* fix some pyright issues

* parallelizing tests

* distributed tests in CI

* fix action

* try to make tests faster

* remove experiment with no canonical run

* try to debug issue with normalizing ensemble

works on my machine :(

* [important] remove old files

* move the merge pair samplers code to math folder

* fix import in tests

* default to cpu if no cuda in spd-cluster

* default to cpu if no cuda in spd-cluster

* wip wandb logging for spd-cluster refactor

* more wip wandb logging for spd-cluster refactor

* format?

* wandb log tensor info

* wandb log tensor info wip

* wandb log tensor info wip

* wandb log tensor info wip

* some figs on wandb

* format

* wip

* [temp] ignore config rename/deprecated warns

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* [hack] lack of canonical runs causing error

* wip

* wip, cleaning up s2

* wip

* nathan approved MDL cost computation

* swap to log2 in mdl

* log MDL cost to wandb, other minor fixes

* wip

* wip

* wip

* some logging stuff fixed. now tryna figure out why way too many merge hists are loaded

* [important] got run loading to work right!!!

* wip

* format

* [important] FEATURE COMPLETE??

* format

* minor changes to wandb stuff

* fix tests?

* format

* fix pyright errors, some warnings remain

* pyright passing!!!

* removed old ignores in pyproject.toml, fixed the resulting errors

* ?

* fix

* fix registry

* numprocess to auto for pytest with xdist

* move the TaskName definition

* remove pyright ignore

* move clustering stuff into clustering folders

* format

* fix type error

* no more dep graph

* remove outdated TODO.md

* remove old ruff exclude

* remove unused function

* remove `from __future__ import annotations`

* factor out component filtering, add tests for it

* tensor stats randomized for big matrix

* wip

* minor fix related to passing filtered labels

* histograms as wandb Images instead of plotly

* wip

* factor out some logging stuff, lots of minor changes

* wandb log semilog of merge pair cost

* oops, fix semilog

* log hists for stuff

* close figures to avoid memory leaks, add some logging to help profile

* wip, added some logging

* logging stuff

* pyright fixes

* make the merge history more lightweight

- get rid of stats dicts, we store everything in wandb anyway
- get rid of old sweep stuff, we keep it in config anyway
- some code used only in experiment notebooks is broken

* wip

* format

* intervals dict

* fix issue with zip file closed

* wip

* remove merge profiling code

* get rid of some old comments

* more explanation around popping

#43 (comment)

* components_in_pop_grp -> n_components_in_pop_grp

#43 (comment)

* remove ignore deprecated config warnings todo

see #43 (comment)

this will probably cause a lot of warnings to be logged unless canonical runs updated?

* some typing fixes

* fix pyright issues by pinning `transformers` package

see #139

* fix annoying mock test issue

* fix pyright issue

* remove call to deprecated plotting function

* remove dead test

* give sigmoid type

* update default ss decomp run

* wip

* give sigmoid type

* wip

* fix interface, store ComponentModel.module_paths

maybe it can be a property that gets the keys? and asserts
that keys in components and gates match?

* patched model -> target model when using config

* fix: no more patched_model, use target_model for config

* comments

* fix cuda_memory_used.py

* removed `spd/clustering/math/dev.py`

#43 (comment)

* `StatsKeys` -> `StatsKey`

#43 (comment)

* remove `plot_merge_history_costs`

#43 (comment)

* remove old js-embedding-vis dep

* Oli clustering refactor (#172)

# PR #172 Commit List

#172

## Initial Work by oli-clive-griffin

- [7c488fc](7c488fc) - wip
- [b506574](b506574) - wip
- [56f6b8a](56f6b8a) - remove srt
- [8682bec](8682bec) - wip
- [e3645c5](e3645c5) - wip
- [b246b31](b246b31) - wip
- [b44abe8](b44abe8) - it runs!

## Merge and Formatting

- [f83ed30](f83ed30) - Merge branch 'feature/clustering' into feature/clustering-refactor-v1
- [7700e6e](7700e6e) - format

## Continued Refactoring by oli-clive-griffin

- [3ea89ae](3ea89ae) - wip
- [bc37ae3](bc37ae3) - wip
- [e30d680](e30d680) - wip
- [d16cb96](d16cb96) - wip
- [231af5a](231af5a) - wip
- [988a9c8](988a9c8) - wip
- [7665879](7665879) - wip
- [b01130e](b01130e) - wip
- [067573a](067573a) - wip

## Integration and Fixes by mivanit

- [1311a54](1311a54) - Merge branch 'feature/oli-cluster' into feature/clustering-refactor-v1
- [28b6e01](28b6e01) - format fixes
- [b0304d9](b0304d9) - remove old s2 script step
- [ac10010](ac10010) - wip (oli-clive-griffin)
- [fce4007](fce4007) - pyright passing
- [238fa31](238fa31) - fix pih tests
- [94f3f92](94f3f92) - better path handling
- [2b4feee](2b4feee) - reorg of pipeline
- [901cebd](901cebd) - fixing tests
- [60ed9d9](60ed9d9) - wip (oli-clive-griffin)
- [ce01a15](ce01a15) - wip

## Storage Implementation

- [a04abba](a04abba) - Merge branch 'demo/cluster-storage' into feature/clustering-refactor-v1
- [b829f3e](b829f3e) - only need storage stuff
- [cc202ce](cc202ce) - wip
- [f2ecda1](f2ecda1) - wip storage
- [eb7a30d](eb7a30d) - move storage.py to pipeline
- [b4ae5de](b4ae5de) - docstring with tree
- [8970478](8970478) - make some stuff private in storage
- [55ba126](55ba126) - use ClusteringStorage everywhere
- [c85e357](c85e357) - Remove unused old path logic
- [7e66a3a](7e66a3a) - add back model_dump_with_properties
- [62dfcdf](62dfcdf) - fix tests
- [a0a3b2e](a0a3b2e) - Revert config change
- [1f95caf](1f95caf) - add storage tests
- [d860f01](d860f01) - simplify imports

## Final Cleanup and Testing

- [0b94ef1](0b94ef1) - format
- [e850e41](e850e41) - format and type check fixes
- [7e83333](7e83333) - re-add notebooks as tests
- [1a1e521](1a1e521) - fix configs
- [338b761](338b761) - allow toml config files
- [e809810](e809810) - wip
- [6633e69](6633e69) - wip
- [987e1f0](987e1f0) - better logging
- [0c9e52d](0c9e52d) - wip
- [18b424d](18b424d) - no default for n samples
- [9f7321d](9f7321d) - add a todo

## Final Refinements

- [dc86bae](dc86bae) - add back BatchedGroupMerge util methods
- [079b3f2](079b3f2) - add link in comment
- [9f75588](9f75588) - inline alive_mask as ~dead_components
- [73a3378](73a3378) - wip
- [46d4498](46d4498) - from_file() -> read() on RunConfig
- [227f6ba](227f6ba) - rename: RunConfig -> ClusteringRunConfig
- [b6869c3](b6869c3) - add back comments
- [22acf57](22acf57) - more comments in merge.py
- [e83bc6a](e83bc6a) - comment edit
- [67af4a6](67af4a6) - wip
- [f9c702e](f9c702e) - fix clustering failing by reverting to shelling out inst. of pool
- [7f1516a](7f1516a) - Revert "fix clustering failing by reverting to shelling out inst. of …
- [0db0379](0db0379) - revert to shelling out, fix logging
- [e935a78](e935a78) - delete old code
- [69f0bc8](69f0bc8) - [!!!] tests passing


# commit list
* wip

* wip

* remove srt

* wip

* wip

* wip

* it runs!

* format

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* format fixes

* remove old s2 script step

* wip

* pyright passing

* fix pih tests

* better path handling

* reorg of pipeline

* fixing tests

* wip

* wip

* only need storage stuff

* wip

* wip storage

* move storage.py to pipeline

* docstring with tree

* make some stuff private in storage

* use ClusteringStorage everywhere

* Remove unused old path logic

* add back model_dump_with_properties

* fix tests

* Revert config change

* add storage tests

* simplify imports

* format

* format and type check fixes

* re-add notebooks as tests

* fix configs

* allow toml config files

* wip

* wip

* better logging

* wip

* no default for n samples

* add a todo

* add back BatchedGroupMerge util methods

* add link in comment

* inline `alive_mask` as `~dead_components`

pretty sure claude changed this for no reason at some point

* wip

* from_file() -> read() on RunConfig

* rename: RunConfig -> ClusteringRunConfig

* add back comments

* more comments in merge.py

* comment edit

* wip

* fix clustering failing by reverting to shelling out inst. of pool

- originally we would shell out for the different clustering processes, and use a separate fd passing json
- then we switched to multiprocessing.pool
- this would cause model loading to fail -- not sure why
- this reverts it to the old style

there are still some old files laying around, theyll be removed

* Revert "fix clustering failing by reverting to shelling out inst. of pool"

This reverts commit f9c702e.

reverting temporarily

* revert to shelling out, fix logging

* delete old code

* [!!!] tests passing

---------

Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com>

* fixes/improvements to dist_utils

* type fixes?

* wip

* Sync everything from feature/clustering-dashboard except spd/clustering/dashboard/

* minimizing diff

* minimize pyprojec.toml diff

* minimizing diff, removed deps

* uv sync

* fixing state dict mappings

* test parallelization

* parallelize tests

* device getting utils

* add TaskName type

* uv sync (pytest-xdist dep)

* remove old junk from Makefile

* globally unique ports in tests to allow parallelization

* comments explaining port allocation in tests

* add distributed marker, rull all distributed tests on same worker

* Revert "add distributed marker, rull all distributed tests on same worker"

This reverts commit 3f55ffa.

* add distributed marker, rull all distributed tests on same worker

* refactor: use general_utils methods for getting device everywhere

* wip jaccard

* wip jaccard

* wip jaccard

* wip jaccard (plotting)

* found where to increase timeout

* wip jaccard

* make format

* fixes

* typing fixes

* claude doing a bunch of type hinting

* trying to get pyright passing?

this doesnt cause any issues locally :/

* pyright works both locally and in CI

* allow installing cpu-only torch in CI

* figure out CI disk usage by tests on main

* alternate strategy for install

basically:
- torch and torchvision in a `pytorch` dep group
- `pytorch` dep group, along with `dev`, is installed by default (uv sync behavior unchanged)
- `make install-ci` recipe which first manually installs torch stuff, then uv sync but NO `pytorch` dep group

* fixes to the last commit

* cleanup temp changes

* make in CI

* wip

* wip

* wip

* uv sync

* try to fix markupsafe?

* pin markup safe with explanation

* update lockfile??

* nope i think we need the index strategy

* ?

* markupsafe issue

* remove disk usage printing

* fix pyright issue

* dependency hell

* fix deps???

* oops, missing index strategy. moved to makefile

* re-lock

* make from /usr/bin/ ?

* dependency hell

* type checking hell

* Update spd/utils/general_utils.py

Co-authored-by: Dan Braun <dan.braun@goodfire.ai>

* wrap and fix Conv1D imports

* minimize diff cleanup

* try compile-bytecode for ci install

* dont compile bytecode actually

compare:

- no compile: https://github.com/goodfire-ai/spd/actions/runs/18316145235/job/52156789078
- yes compile: https://github.com/goodfire-ai/spd/actions/runs/18316319784/job/52157420005

approximately no speedup in tests or basedpyright run, but extra 10s added to install step (where the compilation happens)

* remove markupsafe constraint?

* switched to use get_obj_device

* remove device: torch.device type hints

see #186 (comment)

* remove "distributed" test marker

* fix another timeout

* replace get_module_device -> get_obj_device

* better comments on port uniqueness

* remove old markers

port uniqueness should resolve the issue, without causing slowdowns

* remove timeout TODO comments

* removed checks.yaml timeout todo, clustering tests pass in ~12min

* [diff-min] transformers version issue from #139 resolved

* fix comment

* wip jaccard

* pyright fixes to jaccard, wip

* Update docs about grad syncing with DDP

* Mention feature/memorization-experiments in README

* Fix train and eval metrics and hidden_act_recon (#189)

* Update canonical runs and change target model path (#197)

* Avoid using too many processes in tests

* fix wandb model paths to older runs

`goodfire/spd/runs/{id}` -> `goodfire/spd-pre-Sep-2025/runs/{id}`

see 5ba1d24

---------

Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com>
Co-authored-by: Dan Braun <dan.braun@goodfire.ai>

* timing cluster_ss.py

* Revert "timing cluster_ss.py"

This reverts commit 26c2957.

* [clustering] `cluster_ss.py` speedup ci (#199)

## Description
clustering tests were slow (see [`actions/runs/18406085786/job/52446229079`](https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079), taking nearly 10 minutes [Fig 1].

In particular, `tests/clustering/test_clustering_experiments.py::test_cluster_ss_notebook` was taking ~500 seconds, nearly 10x the next slowest test. This test just runs `tests/clustering/scripts/cluster_ss.py` as a script. Turns out >90% of that time was spent on the dataset [Fig 2], so we switched it to be streaming. this required a minor refactor.

Note that after we fixed the above, `test_cluster_ss_notebook` was fast, but streaming was not enabled for `test_clustering_with_simplestories_config` since that tests it thru cli -- hence the downloading was just happening for the latter. so, we adjust the config and cli
to allow for enabling dataset streaming in see [17bfdc4](17bfdc4)


### Fig 1

```
============================= slowest 10 durations =============================
504.48s call     tests/clustering/test_clustering_experiments.py::test_cluster_ss_notebook
66.45s call     tests/clustering/test_clustering_experiments.py::test_clustering_with_resid_mlp1_config
48.46s call     tests/test_distributed.py::TestDistributedDeterminicity::test_distributed_determinicity
47.01s call     tests/clustering/test_clustering_experiments.py::test_clustering_with_simplestories_config
17.16s call     tests/clustering/test_clustering_experiments.py::test_cluster_resid_mlp_notebook
6.18s call     tests/metrics/test_alive_components_distributed.py::TestDistributedAliveComponentsTracker::test_distributed_alive_components
5.36s call     tests/test_gpt2.py::test_gpt_2_decomposition_happy_path
4.67s call     tests/test_wandb_run_loading.py::test_loading_from_wandb[tms_5-2-id-wandb:goodfire/spd/runs/dwalcejo-_from_run_info]
4.56s call     tests/test_wandb_run_loading.py::test_loading_from_wandb[resid_mlp2-wandb:goodfire/spd/runs/6vpsvdl5-_from_pretrained]
4.37s call     tests/test_wandb_run_loading.py::test_loading_from_wandb[tms_5-2-wandb:goodfire/spd/runs/4stmo6p5-_from_run_info]
```
> from https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079


### Fig 2

```
 Timer records (s):
Load model                         15.36
Load data                         511.21
Load data batch                     0.05
Get component activations           5.11
Process activations                 0.08
Plot activations                    8.05
Compute merge iterations            0.51
compute distances                   0.18
plot distances                      0.01
```
> from https://github.com/goodfire-ai/spd/actions/runs/18407424091/job/52450587755


## Overall Impact
- total CI runtime reduced from 12 minutes to 4 minutes. time for specifically tests reduced from 11 min to 3 min.
  - before: https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079
  - after: https://github.com/goodfire-ai/spd/actions/runs/18410271787/job/52460485800
- a few minor backwards compatible interface additions
  - allow specifying `dataset_streaming` for `"lm"` tasks
  - format specifier for saving figures (I thought svg might be a bit faster. left it in because seems useful generally)


## Related Issue

See #43 (comment)

## Does this PR introduce a breaking change?

No, but did modify a few interfaces.


# Commits:

* timing cluster_ss.py

nearly identical to 26c2957 but accidentally committed that to clustering/main

* better timing

* fix timing

* allow saving in formats besides pdf

* streaming dataset for notebook to avoid download?

* oops

* remove timers

* minimize diff

* remove custom CI timing step

* dataset streaming in config+cli for spd-cluster

test_cluster_ss_notebook was fast, but streaming was not enabled for test_clustering_with_simplestories_config
since that tests it thru cli -- hence the downloading was just happening for the latter. so, we adjust the config and cli
to allow for enabling dataset streaming

* wip

* cuda issues???

* [temp] telemetry for action

using https://github.com/catchpoint/workflow-telemetry-action

* minor fixes from claude

see #199 (comment)

* [temp] no docker container for workflow telemetry to work

* sudo in workflow for apt-get

* use --dist worksteal in CI pytest

* revert temp CI changes

* minor fixes. accidentally removed container in CI lol

* link to issue in script

we do the weird thing because of CUDA issues. we should remove it
at some point, and make sure that the CUDA worker catches it

#201 (comment)

* add num_nonsingleton_groups stat from PR170

see #170

* switch BaseModel to BaseConfig, get rid of old save/read logic (#209)

adapt `clustering/main` and the associated #198 to use the new `BaseConfig` introduced in #200 



* switch BaseModel to BaseConfig, get rid of old save/read logic

* fix typo

* fix pydantic validation issue

* use model_copy to avoid editing frozen dict

when updating ClusteringRunConfig from CLI

* remove deprecated config fields

* switch to new run

* wip sigmoid issues

* that worked...?

* dont assert positive coacts?

* get rid of long-running merge pair sampler on GPU test

well... somewhere in the range of 0.5 to 1s, but its the longest
running test out of all the standard tests

* [clustering] Refactor to two-stage process (#203)

* Have clustering be submitted via slurm jobs

* Functional run_clustering.py script

* Functional submitter (though workspace broken)

* Refactors

* Move pipeline_config.yaml to configs/

* Add non-used config entries to spd/clustering/configs/example.yaml

* --n_runs --> --n-runs

* Update run paths

* Add calc_distances script

* Fix run_pipeline wandb workspace

* Have MergeConfig inherit from BaseConfig

* Remove unused tensor_stats.py

* Improve script docs and output file naming

* Use existing random id generator which uses secrets

* run_clustering -> main

* Get tests passing (and remove some old tests)

* Remove REFACTOR.md

* Avoid loading run info twice

* Add comment to test_run_clustering_happy_path

* remove matplotlib related type ignore comments causing warns

likely related to [f9a2a97](f9a2a97) / #210

* re-add --dist-worksteal in CI

see https://github.com/goodfire-ai/spd/pull/203/files#r2425944229

* pdf_prefix -> figure_prefix (minimizing diff)

* Add calc_distances to pipeline

* Explicitly state distance plot file in pipeline

* move ClusteringRunConfig to separate file

run_clustering.py is very long, plus this helps us minimize the diff

* [temp] rename {merge,clustering}_run_config for easier diff

* re-add dataset_streaming option for fast CI tests

* remove spd/clustering/utils

it was empty

* add `make clean` recipe

* [wip] add back storage.py

* wip

* wip

* make format

* `Command` object continuing refactor, `make test` passing

* try to fix some tests

* working on fixing tests, some config issues, wip

* ExecutionStamp reworking, it will manage output dirs

* CI install fix -- python version used was 3.12, now we are on 3.13

* sigmoid typing fixes !!! unsure abt wht to use from CIOutputs

* catch exception when checking git repo is clean

* remove merge pair sampler GPU test

* allow errors when calling repo_is_clean

* wip debugging

* fix??

* try/except workspace new view creation

* fix create_clustering_workspace_view

* just dont pass a wandb project to tests

* config woes

* wip

* noneable slurm config

* fix cli

* wip

* move `Command` to own class

* storage refactor

* wip

* ensemble id keys must be unique if registered in db, pass None

* snapshot creation, cli stuff

* logging cleanup

* Remove Command and use shlex

* Remove Storage properties

---------

Co-authored-by: Michael Ivanitskiy <mivanits@umich.edu>

* [clustering] distance computation (#213)

add `matching_dist` and `matching_dist_vec` distance computations

* wip

* better debugging plots

* wip

* wip

* wip

* format

* pyright fixes

* wip

* wip

* [!!!] remove old testing code

* integrate distances with the rest of the pipeline

* wip

* wip

* wip

* wip

* dev recipe

* fix pyright warning

* wip

* wip

* wip

* wip

* make format

* cli control for calc_distances

`time` command doesnt work in CI or on andromeda

* rework configs, make tests simpler and faster

* fix config typo distances_method -> distances_methods

* [clustering] config refactor (#227)

## Core changes
- `ClusteringRunConfig.idx_in_ensemble` is removed entirely. index is now auto assigned
- ~~pipeline config can contain merge run config either inline or as path~~
  - ~~quite annoying when doing experiments to have to go and edit two files~~
  - no longer a feature, but path will be validated on config load. see discussion in #227 (comment)
- remove component popping
  - change brought over from #206 via commit [a1f1146](a1f1146)
- clustering runs will now actually use the run id for the wandb run id
  - spd decomp runs don't yet


## Housekeeping

- separate folders for pipeline configs and merge run configs
  - clearer to the user, and enables testing configs without trying to infer what type they are
- more validators on the configs
- some config tests, importantly validating all configs in the `spd/clustering/configs/` dir
  - often was previously the case that config schema or var names would change, old configs would go out of date, and this would not be immediately obvious
- tests for ensemble registry

## Key questions:

- [x] is this way of handling "incline config" vs "path to config" fine, or should I do it the more pydantic-y way?
  - only path to config, see #227 (comment)
- [x] we can in principle now get rid of `idx_in_ensemble` entirely, since the database read/write should handle uniqueness of indexes. I have kept it in for now, but happy to get rid of it
  - got rid of it, see [40df505](40df505)
- ~~when a config already exists at the expected path, should we compare strings or the loaded object? the former is probably faster since pydantic validations add overhead. also, hash collisions are vanishingly unlikely here~~ N/A, removed inline config functionality


Commits:

* allow specifying either config path or mrc cfg in pipeline cfg

* [wip] reorg configs

* added default `None` for slurm partition and job name prefix

* refactor configs, add config tests

* fix tests

* allow `None` or `-1` idx_in_ensemble

- idx_in_ensemble is None iff ensemble_id is None
- idx_in_ensemble == -1 will make register_clustering_run() auto-assign next avalible index
- added tests for ensemble registry

* whoops, wrong name on fixture

* fix idx passed in tests when not needed

* rename "mrc" -> "crc" in paths

I forgot its no longer called "MergeRunConfig"

* rename merge_run_config.py -> clustering_run_config.py

* fix pyright

* fix idx_in_ensemble being passed in tests

* rename cache dir 'merge_run_configs' -> 'clustering_run_configs'

* remove component popping
changes brought in from
PR #206
branch clustering/refactor-multi-batch
commit [9cbb52f](9cbb52f)

* dont pass batch size, change not brought in here

* fix history_path extension and storage usage

* dev pipeline

* better config validation tests

* set default base output dir

* wandb use run id for clustering, TODO for spd decomp

* basedpyright 1.32.0 causes issues, esp w/ wandb

https://github.com/goodfire-ai/spd/actions/runs/18719611602/job/53388090437

* remove idx_in_ensemble, always auto-assigned now

see #227 (comment)

* only allow passing clustering run config path, not inline

see discussion at #227 (comment)

have tried to make this change as isolated as possible -- i think this was a useful feature and we may want to add it back at some point

* rename run_clustering_config_path -> clustering_run_config_path

old name didnt make sense, since it should be a path to a file with a `ClusteringRunConfig`

see #227 (comment)

* deps

* uv lock

* fix merge?

* Add get_cluster_mapping.py and temp backward compatibility fixes

* Add more clustering configs

* spd-cluster -> spd-clustering for consistency

* Remove muutils

* Delete various clustering tests

* Misc clean

* some refactoring of slurm utils

* Refactor slurm utils

* Address AI comments on refactor

* Remove run_locally out of slurm.py

* Don't assume .env exists

* misc cleanups

* Fix test mocks

* Stop harvesting if datasaet is depleted

* Add clustering CLAUDE.md

---------

Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com>
Co-authored-by: Dan Braun <dan.braun@goodfire.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clustering anything to do with clustering exp/investigation A research investigation or experiment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants