Support clustering by mivanit · Pull Request #198 · goodfire-ai/spd

mivanit · 2025-10-10T12:11:18Z

NOTE: Commit before I (Dan) started putting this in a state for merging to main.

Description

Implements clustering of subcomponents based on minimum description length (MDL) clustering.
Refactor slurm scripts for consistency across spd runs, harvest, autointerp, and clustering

Related Issues

Does this PR cause a breaking change?

No

Squashing #43 to a clean branch for clustering work #43 (comment) # Commits * refactor to use MergeRunConfig everywhere * Wip * wip * wip * some info to yaml * todos * LFGgsgsgs * wip * refactor configs. new system with ability to choose sampling mechanism. code will be pushed later * refactor makefile to use new configs * add tests. code still not comitted yet * format * fix tests * test configs * TESTS PASSIN LFGGGG * format * pyright fixes * fix some pyright issues * parallelizing tests * distributed tests in CI * fix action * try to make tests faster * remove experiment with no canonical run * try to debug issue with normalizing ensemble works on my machine :( * [important] remove old files * move the merge pair samplers code to math folder * fix import in tests * default to cpu if no cuda in spd-cluster * default to cpu if no cuda in spd-cluster * wip wandb logging for spd-cluster refactor * more wip wandb logging for spd-cluster refactor * format? * wandb log tensor info * wandb log tensor info wip * wandb log tensor info wip * wandb log tensor info wip * some figs on wandb * format * wip * [temp] ignore config rename/deprecated warns * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * [hack] lack of canonical runs causing error * wip * wip, cleaning up s2 * wip * nathan approved MDL cost computation * swap to log2 in mdl * log MDL cost to wandb, other minor fixes * wip * wip * wip * some logging stuff fixed. now tryna figure out why way too many merge hists are loaded * [important] got run loading to work right!!! * wip * format * [important] FEATURE COMPLETE?? * format * minor changes to wandb stuff * fix tests? * format * fix pyright errors, some warnings remain * pyright passing!!! * removed old ignores in pyproject.toml, fixed the resulting errors * ? * fix * fix registry * numprocess to auto for pytest with xdist * move the TaskName definition * remove pyright ignore * move clustering stuff into clustering folders * format * fix type error * no more dep graph * remove outdated TODO.md * remove old ruff exclude * remove unused function * remove `from __future__ import annotations` * factor out component filtering, add tests for it * tensor stats randomized for big matrix * wip * minor fix related to passing filtered labels * histograms as wandb Images instead of plotly * wip * factor out some logging stuff, lots of minor changes * wandb log semilog of merge pair cost * oops, fix semilog * log hists for stuff * close figures to avoid memory leaks, add some logging to help profile * wip, added some logging * logging stuff * pyright fixes * make the merge history more lightweight - get rid of stats dicts, we store everything in wandb anyway - get rid of old sweep stuff, we keep it in config anyway - some code used only in experiment notebooks is broken * wip * format * intervals dict * fix issue with zip file closed * wip * remove merge profiling code * get rid of some old comments * more explanation around popping #43 (comment) * components_in_pop_grp -> n_components_in_pop_grp #43 (comment) * remove ignore deprecated config warnings todo see #43 (comment) this will probably cause a lot of warnings to be logged unless canonical runs updated? * some typing fixes * fix pyright issues by pinning `transformers` package see #139 * fix annoying mock test issue * fix pyright issue * remove call to deprecated plotting function * remove dead test * give sigmoid type * update default ss decomp run * wip * give sigmoid type * wip * fix interface, store ComponentModel.module_paths maybe it can be a property that gets the keys? and asserts that keys in components and gates match? * patched model -> target model when using config * fix: no more patched_model, use target_model for config * comments * fix cuda_memory_used.py * removed `spd/clustering/math/dev.py` #43 (comment) * `StatsKeys` -> `StatsKey` #43 (comment) * remove `plot_merge_history_costs` #43 (comment) * remove old js-embedding-vis dep * Oli clustering refactor (#172) # PR #172 Commit List #172 ## Initial Work by oli-clive-griffin - [7c488fc](7c488fc) - wip - [b506574](b506574) - wip - [56f6b8a](56f6b8a) - remove srt - [8682bec](8682bec) - wip - [e3645c5](e3645c5) - wip - [b246b31](b246b31) - wip - [b44abe8](b44abe8) - it runs! ## Merge and Formatting - [f83ed30](f83ed30) - Merge branch 'feature/clustering' into feature/clustering-refactor-v1 - [7700e6e](7700e6e) - format ## Continued Refactoring by oli-clive-griffin - [3ea89ae](3ea89ae) - wip - [bc37ae3](bc37ae3) - wip - [e30d680](e30d680) - wip - [d16cb96](d16cb96) - wip - [231af5a](231af5a) - wip - [988a9c8](988a9c8) - wip - [7665879](7665879) - wip - [b01130e](b01130e) - wip - [067573a](067573a) - wip ## Integration and Fixes by mivanit - [1311a54](1311a54) - Merge branch 'feature/oli-cluster' into feature/clustering-refactor-v1 - [28b6e01](28b6e01) - format fixes - [b0304d9](b0304d9) - remove old s2 script step - [ac10010](ac10010) - wip (oli-clive-griffin) - [fce4007](fce4007) - pyright passing - [238fa31](238fa31) - fix pih tests - [94f3f92](94f3f92) - better path handling - [2b4feee](2b4feee) - reorg of pipeline - [901cebd](901cebd) - fixing tests - [60ed9d9](60ed9d9) - wip (oli-clive-griffin) - [ce01a15](ce01a15) - wip ## Storage Implementation - [a04abba](a04abba) - Merge branch 'demo/cluster-storage' into feature/clustering-refactor-v1 - [b829f3e](b829f3e) - only need storage stuff - [cc202ce](cc202ce) - wip - [f2ecda1](f2ecda1) - wip storage - [eb7a30d](eb7a30d) - move storage.py to pipeline - [b4ae5de](b4ae5de) - docstring with tree - [8970478](8970478) - make some stuff private in storage - [55ba126](55ba126) - use ClusteringStorage everywhere - [c85e357](c85e357) - Remove unused old path logic - [7e66a3a](7e66a3a) - add back model_dump_with_properties - [62dfcdf](62dfcdf) - fix tests - [a0a3b2e](a0a3b2e) - Revert config change - [1f95caf](1f95caf) - add storage tests - [d860f01](d860f01) - simplify imports ## Final Cleanup and Testing - [0b94ef1](0b94ef1) - format - [e850e41](e850e41) - format and type check fixes - [7e83333](7e83333) - re-add notebooks as tests - [1a1e521](1a1e521) - fix configs - [338b761](338b761) - allow toml config files - [e809810](e809810) - wip - [6633e69](6633e69) - wip - [987e1f0](987e1f0) - better logging - [0c9e52d](0c9e52d) - wip - [18b424d](18b424d) - no default for n samples - [9f7321d](9f7321d) - add a todo ## Final Refinements - [dc86bae](dc86bae) - add back BatchedGroupMerge util methods - [079b3f2](079b3f2) - add link in comment - [9f75588](9f75588) - inline alive_mask as ~dead_components - [73a3378](73a3378) - wip - [46d4498](46d4498) - from_file() -> read() on RunConfig - [227f6ba](227f6ba) - rename: RunConfig -> ClusteringRunConfig - [b6869c3](b6869c3) - add back comments - [22acf57](22acf57) - more comments in merge.py - [e83bc6a](e83bc6a) - comment edit - [67af4a6](67af4a6) - wip - [f9c702e](f9c702e) - fix clustering failing by reverting to shelling out inst. of pool - [7f1516a](7f1516a) - Revert "fix clustering failing by reverting to shelling out inst. of … - [0db0379](0db0379) - revert to shelling out, fix logging - [e935a78](e935a78) - delete old code - [69f0bc8](69f0bc8) - [!!!] tests passing # commit list * wip * wip * remove srt * wip * wip * wip * it runs! * format * wip * wip * wip * wip * wip * wip * wip * wip * wip * format fixes * remove old s2 script step * wip * pyright passing * fix pih tests * better path handling * reorg of pipeline * fixing tests * wip * wip * only need storage stuff * wip * wip storage * move storage.py to pipeline * docstring with tree * make some stuff private in storage * use ClusteringStorage everywhere * Remove unused old path logic * add back model_dump_with_properties * fix tests * Revert config change * add storage tests * simplify imports * format * format and type check fixes * re-add notebooks as tests * fix configs * allow toml config files * wip * wip * better logging * wip * no default for n samples * add a todo * add back BatchedGroupMerge util methods * add link in comment * inline `alive_mask` as `~dead_components` pretty sure claude changed this for no reason at some point * wip * from_file() -> read() on RunConfig * rename: RunConfig -> ClusteringRunConfig * add back comments * more comments in merge.py * comment edit * wip * fix clustering failing by reverting to shelling out inst. of pool - originally we would shell out for the different clustering processes, and use a separate fd passing json - then we switched to multiprocessing.pool - this would cause model loading to fail -- not sure why - this reverts it to the old style there are still some old files laying around, theyll be removed * Revert "fix clustering failing by reverting to shelling out inst. of pool" This reverts commit f9c702e. reverting temporarily * revert to shelling out, fix logging * delete old code * [!!!] tests passing --------- Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com> * fixes/improvements to dist_utils * type fixes? * wip * Sync everything from feature/clustering-dashboard except spd/clustering/dashboard/ * minimizing diff * minimize pyprojec.toml diff * minimizing diff, removed deps * uv sync * fixing state dict mappings * test parallelization * parallelize tests * device getting utils * add TaskName type * uv sync (pytest-xdist dep) * remove old junk from Makefile * globally unique ports in tests to allow parallelization * comments explaining port allocation in tests * add distributed marker, rull all distributed tests on same worker * Revert "add distributed marker, rull all distributed tests on same worker" This reverts commit 3f55ffa. * add distributed marker, rull all distributed tests on same worker * refactor: use general_utils methods for getting device everywhere * wip jaccard * wip jaccard * wip jaccard * wip jaccard (plotting) * found where to increase timeout * wip jaccard * make format * fixes * typing fixes * claude doing a bunch of type hinting * trying to get pyright passing? this doesnt cause any issues locally :/ * pyright works both locally and in CI * allow installing cpu-only torch in CI * figure out CI disk usage by tests on main * alternate strategy for install basically: - torch and torchvision in a `pytorch` dep group - `pytorch` dep group, along with `dev`, is installed by default (uv sync behavior unchanged) - `make install-ci` recipe which first manually installs torch stuff, then uv sync but NO `pytorch` dep group * fixes to the last commit * cleanup temp changes * make in CI * wip * wip * wip * uv sync * try to fix markupsafe? * pin markup safe with explanation * update lockfile?? * nope i think we need the index strategy * ? * markupsafe issue * remove disk usage printing * fix pyright issue * dependency hell * fix deps??? * oops, missing index strategy. moved to makefile * re-lock * make from /usr/bin/ ? * dependency hell * type checking hell * Update spd/utils/general_utils.py Co-authored-by: Dan Braun <dan.braun@goodfire.ai> * wrap and fix Conv1D imports * minimize diff cleanup * try compile-bytecode for ci install * dont compile bytecode actually compare: - no compile: https://github.com/goodfire-ai/spd/actions/runs/18316145235/job/52156789078 - yes compile: https://github.com/goodfire-ai/spd/actions/runs/18316319784/job/52157420005 approximately no speedup in tests or basedpyright run, but extra 10s added to install step (where the compilation happens) * remove markupsafe constraint? * switched to use get_obj_device * remove device: torch.device type hints see #186 (comment) * remove "distributed" test marker * fix another timeout * replace get_module_device -> get_obj_device * better comments on port uniqueness * remove old markers port uniqueness should resolve the issue, without causing slowdowns * remove timeout TODO comments * removed checks.yaml timeout todo, clustering tests pass in ~12min * [diff-min] transformers version issue from #139 resolved * fix comment * wip jaccard * pyright fixes to jaccard, wip * Update docs about grad syncing with DDP * Mention feature/memorization-experiments in README * Fix train and eval metrics and hidden_act_recon (#189) * Update canonical runs and change target model path (#197) * Avoid using too many processes in tests * fix wandb model paths to older runs `goodfire/spd/runs/{id}` -> `goodfire/spd-pre-Sep-2025/runs/{id}` see 5ba1d24 --------- Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com> Co-authored-by: Dan Braun <dan.braun@goodfire.ai>

This reverts commit 26c2957.

## Description clustering tests were slow (see [`actions/runs/18406085786/job/52446229079`](https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079), taking nearly 10 minutes [Fig 1]. In particular, `tests/clustering/test_clustering_experiments.py::test_cluster_ss_notebook` was taking ~500 seconds, nearly 10x the next slowest test. This test just runs `tests/clustering/scripts/cluster_ss.py` as a script. Turns out >90% of that time was spent on the dataset [Fig 2], so we switched it to be streaming. this required a minor refactor. Note that after we fixed the above, `test_cluster_ss_notebook` was fast, but streaming was not enabled for `test_clustering_with_simplestories_config` since that tests it thru cli -- hence the downloading was just happening for the latter. so, we adjust the config and cli to allow for enabling dataset streaming in see [17bfdc4](17bfdc4) ### Fig 1 ``` ============================= slowest 10 durations ============================= 504.48s call tests/clustering/test_clustering_experiments.py::test_cluster_ss_notebook 66.45s call tests/clustering/test_clustering_experiments.py::test_clustering_with_resid_mlp1_config 48.46s call tests/test_distributed.py::TestDistributedDeterminicity::test_distributed_determinicity 47.01s call tests/clustering/test_clustering_experiments.py::test_clustering_with_simplestories_config 17.16s call tests/clustering/test_clustering_experiments.py::test_cluster_resid_mlp_notebook 6.18s call tests/metrics/test_alive_components_distributed.py::TestDistributedAliveComponentsTracker::test_distributed_alive_components 5.36s call tests/test_gpt2.py::test_gpt_2_decomposition_happy_path 4.67s call tests/test_wandb_run_loading.py::test_loading_from_wandb[tms_5-2-id-wandb:goodfire/spd/runs/dwalcejo-_from_run_info] 4.56s call tests/test_wandb_run_loading.py::test_loading_from_wandb[resid_mlp2-wandb:goodfire/spd/runs/6vpsvdl5-_from_pretrained] 4.37s call tests/test_wandb_run_loading.py::test_loading_from_wandb[tms_5-2-wandb:goodfire/spd/runs/4stmo6p5-_from_run_info] ``` > from https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079 ### Fig 2 ``` Timer records (s): Load model 15.36 Load data 511.21 Load data batch 0.05 Get component activations 5.11 Process activations 0.08 Plot activations 8.05 Compute merge iterations 0.51 compute distances 0.18 plot distances 0.01 ``` > from https://github.com/goodfire-ai/spd/actions/runs/18407424091/job/52450587755 ## Overall Impact - total CI runtime reduced from 12 minutes to 4 minutes. time for specifically tests reduced from 11 min to 3 min. - before: https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079 - after: https://github.com/goodfire-ai/spd/actions/runs/18410271787/job/52460485800 - a few minor backwards compatible interface additions - allow specifying `dataset_streaming` for `"lm"` tasks - format specifier for saving figures (I thought svg might be a bit faster. left it in because seems useful generally) ## Related Issue See #43 (comment) ## Does this PR introduce a breaking change? No, but did modify a few interfaces. # Commits: * timing cluster_ss.py nearly identical to 26c2957 but accidentally committed that to clustering/main * better timing * fix timing * allow saving in formats besides pdf * streaming dataset for notebook to avoid download? * oops * remove timers * minimize diff * remove custom CI timing step * dataset streaming in config+cli for spd-cluster test_cluster_ss_notebook was fast, but streaming was not enabled for test_clustering_with_simplestories_config since that tests it thru cli -- hence the downloading was just happening for the latter. so, we adjust the config and cli to allow for enabling dataset streaming * wip * cuda issues??? * [temp] telemetry for action using https://github.com/catchpoint/workflow-telemetry-action * minor fixes from claude see #199 (comment) * [temp] no docker container for workflow telemetry to work * sudo in workflow for apt-get * use --dist worksteal in CI pytest * revert temp CI changes * minor fixes. accidentally removed container in CI lol * link to issue in script we do the weird thing because of CUDA issues. we should remove it at some point, and make sure that the CUDA worker catches it #201 (comment)

see #170

adapt `clustering/main` and the associated #198 to use the new `BaseConfig` introduced in #200 * switch BaseModel to BaseConfig, get rid of old save/read logic * fix typo * fix pydantic validation issue * use model_copy to avoid editing frozen dict when updating ClusteringRunConfig from CLI * remove deprecated config fields

well... somewhere in the range of 0.5 to 1s, but its the longest running test out of all the standard tests

* Have clustering be submitted via slurm jobs * Functional run_clustering.py script * Functional submitter (though workspace broken) * Refactors * Move pipeline_config.yaml to configs/ * Add non-used config entries to spd/clustering/configs/example.yaml * --n_runs --> --n-runs * Update run paths * Add calc_distances script * Fix run_pipeline wandb workspace * Have MergeConfig inherit from BaseConfig * Remove unused tensor_stats.py * Improve script docs and output file naming * Use existing random id generator which uses secrets * run_clustering -> main * Get tests passing (and remove some old tests) * Remove REFACTOR.md * Avoid loading run info twice * Add comment to test_run_clustering_happy_path * remove matplotlib related type ignore comments causing warns likely related to [f9a2a97](f9a2a97) / #210 * re-add --dist-worksteal in CI see https://github.com/goodfire-ai/spd/pull/203/files#r2425944229 * pdf_prefix -> figure_prefix (minimizing diff) * Add calc_distances to pipeline * Explicitly state distance plot file in pipeline * move ClusteringRunConfig to separate file run_clustering.py is very long, plus this helps us minimize the diff * [temp] rename {merge,clustering}_run_config for easier diff * re-add dataset_streaming option for fast CI tests * remove spd/clustering/utils it was empty * add `make clean` recipe * [wip] add back storage.py * wip * wip * make format * `Command` object continuing refactor, `make test` passing * try to fix some tests * working on fixing tests, some config issues, wip * ExecutionStamp reworking, it will manage output dirs * CI install fix -- python version used was 3.12, now we are on 3.13 * sigmoid typing fixes !!! unsure abt wht to use from CIOutputs * catch exception when checking git repo is clean * remove merge pair sampler GPU test * allow errors when calling repo_is_clean * wip debugging * fix?? * try/except workspace new view creation * fix create_clustering_workspace_view * just dont pass a wandb project to tests * config woes * wip * noneable slurm config * fix cli * wip * move `Command` to own class * storage refactor * wip * ensemble id keys must be unique if registered in db, pass None * snapshot creation, cli stuff * logging cleanup * Remove Command and use shlex * Remove Storage properties --------- Co-authored-by: Michael Ivanitskiy <mivanits@umich.edu>

add `matching_dist` and `matching_dist_vec` distance computations * wip * better debugging plots * wip * wip * wip * format * pyright fixes * wip * wip * [!!!] remove old testing code * integrate distances with the rest of the pipeline * wip * wip * wip * wip * dev recipe * fix pyright warning * wip * wip * wip * wip * make format * cli control for calc_distances `time` command doesnt work in CI or on andromeda * rework configs, make tests simpler and faster * fix config typo distances_method -> distances_methods

## Core changes - `ClusteringRunConfig.idx_in_ensemble` is removed entirely. index is now auto assigned - ~~pipeline config can contain merge run config either inline or as path~~ - ~~quite annoying when doing experiments to have to go and edit two files~~ - no longer a feature, but path will be validated on config load. see discussion in #227 (comment) - remove component popping - change brought over from #206 via commit [a1f1146](a1f1146) - clustering runs will now actually use the run id for the wandb run id - spd decomp runs don't yet ## Housekeeping - separate folders for pipeline configs and merge run configs - clearer to the user, and enables testing configs without trying to infer what type they are - more validators on the configs - some config tests, importantly validating all configs in the `spd/clustering/configs/` dir - often was previously the case that config schema or var names would change, old configs would go out of date, and this would not be immediately obvious - tests for ensemble registry ## Key questions: - [x] is this way of handling "incline config" vs "path to config" fine, or should I do it the more pydantic-y way? - only path to config, see #227 (comment) - [x] we can in principle now get rid of `idx_in_ensemble` entirely, since the database read/write should handle uniqueness of indexes. I have kept it in for now, but happy to get rid of it - got rid of it, see [40df505](40df505) - ~~when a config already exists at the expected path, should we compare strings or the loaded object? the former is probably faster since pydantic validations add overhead. also, hash collisions are vanishingly unlikely here~~ N/A, removed inline config functionality Commits: * allow specifying either config path or mrc cfg in pipeline cfg * [wip] reorg configs * added default `None` for slurm partition and job name prefix * refactor configs, add config tests * fix tests * allow `None` or `-1` idx_in_ensemble - idx_in_ensemble is None iff ensemble_id is None - idx_in_ensemble == -1 will make register_clustering_run() auto-assign next avalible index - added tests for ensemble registry * whoops, wrong name on fixture * fix idx passed in tests when not needed * rename "mrc" -> "crc" in paths I forgot its no longer called "MergeRunConfig" * rename merge_run_config.py -> clustering_run_config.py * fix pyright * fix idx_in_ensemble being passed in tests * rename cache dir 'merge_run_configs' -> 'clustering_run_configs' * remove component popping changes brought in from PR #206 branch clustering/refactor-multi-batch commit [9cbb52f](9cbb52f) * dont pass batch size, change not brought in here * fix history_path extension and storage usage * dev pipeline * better config validation tests * set default base output dir * wandb use run id for clustering, TODO for spd decomp * basedpyright 1.32.0 causes issues, esp w/ wandb https://github.com/goodfire-ai/spd/actions/runs/18719611602/job/53388090437 * remove idx_in_ensemble, always auto-assigned now see #227 (comment) * only allow passing clustering run config path, not inline see discussion at #227 (comment) have tried to make this change as isolated as possible -- i think this was a useful feature and we may want to add it back at some point * rename run_clustering_config_path -> clustering_run_config_path old name didnt make sense, since it should be a path to a file with a `ClusteringRunConfig` see #227 (comment)

danbraunai-goodfire · 2025-12-22T15:42:56Z

Old description, for posterity:

NOTE: Commit before I (Dan) started putting this in a state for merging to main.

(migrated from #43)

Description

This PR implements clustering of subcomponents based on minimum description length (MDL) clustering.

The primary interface to this package is via:

python spd/clustering/scripts/main.py --merge-config MERGE_CONFIG [--model-path MODEL_PATH] [--n-batches N_BATCHES] [--batch-size BATCH_SIZE]

This runs a pipeline which:

tokenizes and splits data from the dataset used by the specified model
runs an ensemble of the clustering algorithm, one run for each batch
unifies the histories to account for differences in active components between batches
computes and plots distances between the different merge trajectories, giving insight into which merges may be useful for downstream investigations

Related Issues

#13

Depends on:

TODOs:

Code style

array type hinting, see clustering #43 (comment)
get rid of redundant type hints
- clearer style.md would be useful for this, Claude sometimes omits/removes even the useful type hints
what to do with the old notebook-style scripts tests/clustering/scripts/cluster_{resid_mlp|ss}.py, see clustering #43 (comment)
- I find that I am frequently annoyed by these tests failing, which probably means they are good tests. perhaps they need to be refactored in some way though
- EDIT 2025-10-21: continue to find the script tests to catch issues not caught by any other code. mostly pertaining to CLI and configs
- EDIT 2025-11-11: keeping these for now, they are useful
MergeConfig cleanup, see clustering #43 (comment)
- in particular, the sampling methods implementation

Variable naming

"components" -> "subcomponents"
"n_components", "c_components" -> n_subcomponents
somehow unify naming of "groups", "components", "clusters"
"current_merge" -> "current_state"?
consistency between compute_merge_costs and compute_mdl_cost

Implementation

deciding on shelling out with custom file descriptors vs
using concurrent.futures for running multiple instances of s2_clustering.py in parallel
- the shelling out approach may cause issues with wandb because we both have a "summary" run (which is what computes distances between the different merges) and each run of s2_clustering has its own associated wandb run
there should be a way to call the pipeline without parallel processing for s2
- this should be the default when running with an ensemble size of 1?
allow running of specific stages from cli
I’d like an option to be able to run both:
1. A specific stage of the pipeline
2. All stages from a specific point
  It should be able to load the results in previous stages from a cache. The cli could have a use_cache: bool arg and maybe another cli arg or two to handle which stages it should run.

Testing

more unit tests
test loading and validating all configs
partition tests, so clustering tests run only when clustering code is changed

Methodology

Additions

migrate to slurm for clustering runs ([clustering] Refactor to two-stage process #203)
alternate distance metric (algorithm by @Laplace418) [clustering] distance computation #213
multi-batch clustering [clustering] multi-batch merging refactor #206
adding clustering runs to an ensemble
interface would look something like:
- pass existing ensemble id instead of a config to spd-cluster
- it reads the config and commit hash (errors if no commit hash) from ensemble config
- start a bunch of new clustering runs based on that info
proper rank computation?

Checks/Removals

float vs boolean coactivations: it may be best to stick with just boolean coactivations, but then we should be sure this is reasonable
- if we keep float activations, there is a bug with activation_mask[:, merge_pair[0]] + activation_mask[:, merge_pair[1]]
- merge_config.activation_threshold should probably default to 0 and remove the None if we get rid of float acts
get rid of component popping: we may or may not add this back in the future, but we should try to thin out the features of this PR (completed in [clustering] distance computation #213)
properly test mcmc sampling

mivanit and others added 3 commits October 10, 2025 06:09

timing cluster_ss.py

26c2957

Revert "timing cluster_ss.py"

55be0d8

This reverts commit 26c2957.

mivanit added the clustering anything to do with clustering label Oct 10, 2025

mivanit changed the title ~~main clustering~~ [clustering] main Oct 10, 2025

add num_nonsingleton_groups stat from PR170

13db0be

see #170

mivanit mentioned this pull request Oct 10, 2025

[wip][clustering] visualization dashboard #170

Draft

mivanit added 2 commits October 13, 2025 10:59

Merge branch 'main' into clustering/main

8e9ddd0

Merge branch 'main' into clustering/main

e79ccb8

mivanit mentioned this pull request Oct 13, 2025

switch BaseModel to BaseConfig, get rid of old save/read logic #209

Merged

mivanit and others added 18 commits October 13, 2025 09:33

switch to new run

3f07f42

Merge branch 'main' into clustering/main

37175cb

Merge branch 'main' into clustering/main

ed667c8

Merge branch 'main' into clustering/main

0cdd754

wip sigmoid issues

c06cffe

that worked...?

fddb323

dont assert positive coacts?

8a56e12

Merge branch 'main' into clustering/main

1f18690

get rid of long-running merge pair sampler on GPU test

d18470b

well... somewhere in the range of 0.5 to 1s, but its the longest running test out of all the standard tests

Merge branch 'main' into clustering/main

35b7423

Merge branch 'main' into clustering/main

06ab8cc

Merge branch 'main' into clustering/main

a24361a

Merge branch 'main' into clustering/main

6c70327

deps

1f0725c

mivanit mentioned this pull request Oct 27, 2025

New Interp App #231

Merged

mivanit and others added 11 commits October 27, 2025 11:22

Merge branch 'main' into clustering/main

d8cc0e7

Merge branch 'main' into clustering/main

315e953

uv lock

06deb8c

Merge branch 'main' into clustering/main

ccc960a

fix merge?

a67cbf0

Add get_cluster_mapping.py and temp backward compatibility fixes

e05f412

Merge branch 'main' into clustering/main

b029b3a

Add more clustering configs

91d6c7c

spd-cluster -> spd-clustering for consistency

2e56354

Remove muutils

e0f9b5f

Delete various clustering tests

565f478

danbraunai-goodfire added 9 commits December 23, 2025 09:16

Misc clean

1230557

some refactoring of slurm utils

ae029bf

Refactor slurm utils

83daa0a

Address AI comments on refactor

41f0669

Remove run_locally out of slurm.py

179114f

Don't assume .env exists

9ddecd7

misc cleanups

4018b00

Fix test mocks

7b5cb6d

Stop harvesting if datasaet is depleted

ce216db

danbraunai-goodfire changed the title ~~[clustering] main~~ Support clustering Dec 23, 2025

Add clustering CLAUDE.md

c2aea4e

danbraunai-goodfire merged commit b9884af into main Dec 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support clustering#198

Support clustering#198
danbraunai-goodfire merged 46 commits intomainfrom
clustering/main

mivanit commented Oct 10, 2025 •

edited by danbraunai-goodfire

Loading

Uh oh!

danbraunai-goodfire commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mivanit commented Oct 10, 2025 • edited by danbraunai-goodfire Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Does this PR cause a breaking change?

Uh oh!

danbraunai-goodfire commented Dec 22, 2025

Description

Related Issues

Depends on:

TODOs:

Code style

Variable naming

Implementation

Testing

Methodology

Additions

Checks/Removals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mivanit commented Oct 10, 2025 •

edited by danbraunai-goodfire

Loading