Conversation
Squashing #43 to a clean branch for clustering work #43 (comment) # Commits * refactor to use MergeRunConfig everywhere * Wip * wip * wip * some info to yaml * todos * LFGgsgsgs * wip * refactor configs. new system with ability to choose sampling mechanism. code will be pushed later * refactor makefile to use new configs * add tests. code still not comitted yet * format * fix tests * test configs * TESTS PASSIN LFGGGG * format * pyright fixes * fix some pyright issues * parallelizing tests * distributed tests in CI * fix action * try to make tests faster * remove experiment with no canonical run * try to debug issue with normalizing ensemble works on my machine :( * [important] remove old files * move the merge pair samplers code to math folder * fix import in tests * default to cpu if no cuda in spd-cluster * default to cpu if no cuda in spd-cluster * wip wandb logging for spd-cluster refactor * more wip wandb logging for spd-cluster refactor * format? * wandb log tensor info * wandb log tensor info wip * wandb log tensor info wip * wandb log tensor info wip * some figs on wandb * format * wip * [temp] ignore config rename/deprecated warns * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * [hack] lack of canonical runs causing error * wip * wip, cleaning up s2 * wip * nathan approved MDL cost computation * swap to log2 in mdl * log MDL cost to wandb, other minor fixes * wip * wip * wip * some logging stuff fixed. now tryna figure out why way too many merge hists are loaded * [important] got run loading to work right!!! * wip * format * [important] FEATURE COMPLETE?? * format * minor changes to wandb stuff * fix tests? * format * fix pyright errors, some warnings remain * pyright passing!!! * removed old ignores in pyproject.toml, fixed the resulting errors * ? * fix * fix registry * numprocess to auto for pytest with xdist * move the TaskName definition * remove pyright ignore * move clustering stuff into clustering folders * format * fix type error * no more dep graph * remove outdated TODO.md * remove old ruff exclude * remove unused function * remove `from __future__ import annotations` * factor out component filtering, add tests for it * tensor stats randomized for big matrix * wip * minor fix related to passing filtered labels * histograms as wandb Images instead of plotly * wip * factor out some logging stuff, lots of minor changes * wandb log semilog of merge pair cost * oops, fix semilog * log hists for stuff * close figures to avoid memory leaks, add some logging to help profile * wip, added some logging * logging stuff * pyright fixes * make the merge history more lightweight - get rid of stats dicts, we store everything in wandb anyway - get rid of old sweep stuff, we keep it in config anyway - some code used only in experiment notebooks is broken * wip * format * intervals dict * fix issue with zip file closed * wip * remove merge profiling code * get rid of some old comments * more explanation around popping #43 (comment) * components_in_pop_grp -> n_components_in_pop_grp #43 (comment) * remove ignore deprecated config warnings todo see #43 (comment) this will probably cause a lot of warnings to be logged unless canonical runs updated? * some typing fixes * fix pyright issues by pinning `transformers` package see #139 * fix annoying mock test issue * fix pyright issue * remove call to deprecated plotting function * remove dead test * give sigmoid type * update default ss decomp run * wip * give sigmoid type * wip * fix interface, store ComponentModel.module_paths maybe it can be a property that gets the keys? and asserts that keys in components and gates match? * patched model -> target model when using config * fix: no more patched_model, use target_model for config * comments * fix cuda_memory_used.py * removed `spd/clustering/math/dev.py` #43 (comment) * `StatsKeys` -> `StatsKey` #43 (comment) * remove `plot_merge_history_costs` #43 (comment) * remove old js-embedding-vis dep * Oli clustering refactor (#172) # PR #172 Commit List #172 ## Initial Work by oli-clive-griffin - [7c488fc](7c488fc) - wip - [b506574](b506574) - wip - [56f6b8a](56f6b8a) - remove srt - [8682bec](8682bec) - wip - [e3645c5](e3645c5) - wip - [b246b31](b246b31) - wip - [b44abe8](b44abe8) - it runs! ## Merge and Formatting - [f83ed30](f83ed30) - Merge branch 'feature/clustering' into feature/clustering-refactor-v1 - [7700e6e](7700e6e) - format ## Continued Refactoring by oli-clive-griffin - [3ea89ae](3ea89ae) - wip - [bc37ae3](bc37ae3) - wip - [e30d680](e30d680) - wip - [d16cb96](d16cb96) - wip - [231af5a](231af5a) - wip - [988a9c8](988a9c8) - wip - [7665879](7665879) - wip - [b01130e](b01130e) - wip - [067573a](067573a) - wip ## Integration and Fixes by mivanit - [1311a54](1311a54) - Merge branch 'feature/oli-cluster' into feature/clustering-refactor-v1 - [28b6e01](28b6e01) - format fixes - [b0304d9](b0304d9) - remove old s2 script step - [ac10010](ac10010) - wip (oli-clive-griffin) - [fce4007](fce4007) - pyright passing - [238fa31](238fa31) - fix pih tests - [94f3f92](94f3f92) - better path handling - [2b4feee](2b4feee) - reorg of pipeline - [901cebd](901cebd) - fixing tests - [60ed9d9](60ed9d9) - wip (oli-clive-griffin) - [ce01a15](ce01a15) - wip ## Storage Implementation - [a04abba](a04abba) - Merge branch 'demo/cluster-storage' into feature/clustering-refactor-v1 - [b829f3e](b829f3e) - only need storage stuff - [cc202ce](cc202ce) - wip - [f2ecda1](f2ecda1) - wip storage - [eb7a30d](eb7a30d) - move storage.py to pipeline - [b4ae5de](b4ae5de) - docstring with tree - [8970478](8970478) - make some stuff private in storage - [55ba126](55ba126) - use ClusteringStorage everywhere - [c85e357](c85e357) - Remove unused old path logic - [7e66a3a](7e66a3a) - add back model_dump_with_properties - [62dfcdf](62dfcdf) - fix tests - [a0a3b2e](a0a3b2e) - Revert config change - [1f95caf](1f95caf) - add storage tests - [d860f01](d860f01) - simplify imports ## Final Cleanup and Testing - [0b94ef1](0b94ef1) - format - [e850e41](e850e41) - format and type check fixes - [7e83333](7e83333) - re-add notebooks as tests - [1a1e521](1a1e521) - fix configs - [338b761](338b761) - allow toml config files - [e809810](e809810) - wip - [6633e69](6633e69) - wip - [987e1f0](987e1f0) - better logging - [0c9e52d](0c9e52d) - wip - [18b424d](18b424d) - no default for n samples - [9f7321d](9f7321d) - add a todo ## Final Refinements - [dc86bae](dc86bae) - add back BatchedGroupMerge util methods - [079b3f2](079b3f2) - add link in comment - [9f75588](9f75588) - inline alive_mask as ~dead_components - [73a3378](73a3378) - wip - [46d4498](46d4498) - from_file() -> read() on RunConfig - [227f6ba](227f6ba) - rename: RunConfig -> ClusteringRunConfig - [b6869c3](b6869c3) - add back comments - [22acf57](22acf57) - more comments in merge.py - [e83bc6a](e83bc6a) - comment edit - [67af4a6](67af4a6) - wip - [f9c702e](f9c702e) - fix clustering failing by reverting to shelling out inst. of pool - [7f1516a](7f1516a) - Revert "fix clustering failing by reverting to shelling out inst. of … - [0db0379](0db0379) - revert to shelling out, fix logging - [e935a78](e935a78) - delete old code - [69f0bc8](69f0bc8) - [!!!] tests passing # commit list * wip * wip * remove srt * wip * wip * wip * it runs! * format * wip * wip * wip * wip * wip * wip * wip * wip * wip * format fixes * remove old s2 script step * wip * pyright passing * fix pih tests * better path handling * reorg of pipeline * fixing tests * wip * wip * only need storage stuff * wip * wip storage * move storage.py to pipeline * docstring with tree * make some stuff private in storage * use ClusteringStorage everywhere * Remove unused old path logic * add back model_dump_with_properties * fix tests * Revert config change * add storage tests * simplify imports * format * format and type check fixes * re-add notebooks as tests * fix configs * allow toml config files * wip * wip * better logging * wip * no default for n samples * add a todo * add back BatchedGroupMerge util methods * add link in comment * inline `alive_mask` as `~dead_components` pretty sure claude changed this for no reason at some point * wip * from_file() -> read() on RunConfig * rename: RunConfig -> ClusteringRunConfig * add back comments * more comments in merge.py * comment edit * wip * fix clustering failing by reverting to shelling out inst. of pool - originally we would shell out for the different clustering processes, and use a separate fd passing json - then we switched to multiprocessing.pool - this would cause model loading to fail -- not sure why - this reverts it to the old style there are still some old files laying around, theyll be removed * Revert "fix clustering failing by reverting to shelling out inst. of pool" This reverts commit f9c702e. reverting temporarily * revert to shelling out, fix logging * delete old code * [!!!] tests passing --------- Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com> * fixes/improvements to dist_utils * type fixes? * wip * Sync everything from feature/clustering-dashboard except spd/clustering/dashboard/ * minimizing diff * minimize pyprojec.toml diff * minimizing diff, removed deps * uv sync * fixing state dict mappings * test parallelization * parallelize tests * device getting utils * add TaskName type * uv sync (pytest-xdist dep) * remove old junk from Makefile * globally unique ports in tests to allow parallelization * comments explaining port allocation in tests * add distributed marker, rull all distributed tests on same worker * Revert "add distributed marker, rull all distributed tests on same worker" This reverts commit 3f55ffa. * add distributed marker, rull all distributed tests on same worker * refactor: use general_utils methods for getting device everywhere * wip jaccard * wip jaccard * wip jaccard * wip jaccard (plotting) * found where to increase timeout * wip jaccard * make format * fixes * typing fixes * claude doing a bunch of type hinting * trying to get pyright passing? this doesnt cause any issues locally :/ * pyright works both locally and in CI * allow installing cpu-only torch in CI * figure out CI disk usage by tests on main * alternate strategy for install basically: - torch and torchvision in a `pytorch` dep group - `pytorch` dep group, along with `dev`, is installed by default (uv sync behavior unchanged) - `make install-ci` recipe which first manually installs torch stuff, then uv sync but NO `pytorch` dep group * fixes to the last commit * cleanup temp changes * make in CI * wip * wip * wip * uv sync * try to fix markupsafe? * pin markup safe with explanation * update lockfile?? * nope i think we need the index strategy * ? * markupsafe issue * remove disk usage printing * fix pyright issue * dependency hell * fix deps??? * oops, missing index strategy. moved to makefile * re-lock * make from /usr/bin/ ? * dependency hell * type checking hell * Update spd/utils/general_utils.py Co-authored-by: Dan Braun <dan.braun@goodfire.ai> * wrap and fix Conv1D imports * minimize diff cleanup * try compile-bytecode for ci install * dont compile bytecode actually compare: - no compile: https://github.com/goodfire-ai/spd/actions/runs/18316145235/job/52156789078 - yes compile: https://github.com/goodfire-ai/spd/actions/runs/18316319784/job/52157420005 approximately no speedup in tests or basedpyright run, but extra 10s added to install step (where the compilation happens) * remove markupsafe constraint? * switched to use get_obj_device * remove device: torch.device type hints see #186 (comment) * remove "distributed" test marker * fix another timeout * replace get_module_device -> get_obj_device * better comments on port uniqueness * remove old markers port uniqueness should resolve the issue, without causing slowdowns * remove timeout TODO comments * removed checks.yaml timeout todo, clustering tests pass in ~12min * [diff-min] transformers version issue from #139 resolved * fix comment * wip jaccard * pyright fixes to jaccard, wip * Update docs about grad syncing with DDP * Mention feature/memorization-experiments in README * Fix train and eval metrics and hidden_act_recon (#189) * Update canonical runs and change target model path (#197) * Avoid using too many processes in tests * fix wandb model paths to older runs `goodfire/spd/runs/{id}` -> `goodfire/spd-pre-Sep-2025/runs/{id}` see 5ba1d24 --------- Co-authored-by: Oliver Clive-Griffin <o.clivegriffin@gmail.com> Co-authored-by: Dan Braun <dan.braun@goodfire.ai>
This reverts commit 26c2957.
## Description clustering tests were slow (see [`actions/runs/18406085786/job/52446229079`](https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079), taking nearly 10 minutes [Fig 1]. In particular, `tests/clustering/test_clustering_experiments.py::test_cluster_ss_notebook` was taking ~500 seconds, nearly 10x the next slowest test. This test just runs `tests/clustering/scripts/cluster_ss.py` as a script. Turns out >90% of that time was spent on the dataset [Fig 2], so we switched it to be streaming. this required a minor refactor. Note that after we fixed the above, `test_cluster_ss_notebook` was fast, but streaming was not enabled for `test_clustering_with_simplestories_config` since that tests it thru cli -- hence the downloading was just happening for the latter. so, we adjust the config and cli to allow for enabling dataset streaming in see [17bfdc4](17bfdc4) ### Fig 1 ``` ============================= slowest 10 durations ============================= 504.48s call tests/clustering/test_clustering_experiments.py::test_cluster_ss_notebook 66.45s call tests/clustering/test_clustering_experiments.py::test_clustering_with_resid_mlp1_config 48.46s call tests/test_distributed.py::TestDistributedDeterminicity::test_distributed_determinicity 47.01s call tests/clustering/test_clustering_experiments.py::test_clustering_with_simplestories_config 17.16s call tests/clustering/test_clustering_experiments.py::test_cluster_resid_mlp_notebook 6.18s call tests/metrics/test_alive_components_distributed.py::TestDistributedAliveComponentsTracker::test_distributed_alive_components 5.36s call tests/test_gpt2.py::test_gpt_2_decomposition_happy_path 4.67s call tests/test_wandb_run_loading.py::test_loading_from_wandb[tms_5-2-id-wandb:goodfire/spd/runs/dwalcejo-_from_run_info] 4.56s call tests/test_wandb_run_loading.py::test_loading_from_wandb[resid_mlp2-wandb:goodfire/spd/runs/6vpsvdl5-_from_pretrained] 4.37s call tests/test_wandb_run_loading.py::test_loading_from_wandb[tms_5-2-wandb:goodfire/spd/runs/4stmo6p5-_from_run_info] ``` > from https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079 ### Fig 2 ``` Timer records (s): Load model 15.36 Load data 511.21 Load data batch 0.05 Get component activations 5.11 Process activations 0.08 Plot activations 8.05 Compute merge iterations 0.51 compute distances 0.18 plot distances 0.01 ``` > from https://github.com/goodfire-ai/spd/actions/runs/18407424091/job/52450587755 ## Overall Impact - total CI runtime reduced from 12 minutes to 4 minutes. time for specifically tests reduced from 11 min to 3 min. - before: https://github.com/goodfire-ai/spd/actions/runs/18406085786/job/52446229079 - after: https://github.com/goodfire-ai/spd/actions/runs/18410271787/job/52460485800 - a few minor backwards compatible interface additions - allow specifying `dataset_streaming` for `"lm"` tasks - format specifier for saving figures (I thought svg might be a bit faster. left it in because seems useful generally) ## Related Issue See #43 (comment) ## Does this PR introduce a breaking change? No, but did modify a few interfaces. # Commits: * timing cluster_ss.py nearly identical to 26c2957 but accidentally committed that to clustering/main * better timing * fix timing * allow saving in formats besides pdf * streaming dataset for notebook to avoid download? * oops * remove timers * minimize diff * remove custom CI timing step * dataset streaming in config+cli for spd-cluster test_cluster_ss_notebook was fast, but streaming was not enabled for test_clustering_with_simplestories_config since that tests it thru cli -- hence the downloading was just happening for the latter. so, we adjust the config and cli to allow for enabling dataset streaming * wip * cuda issues??? * [temp] telemetry for action using https://github.com/catchpoint/workflow-telemetry-action * minor fixes from claude see #199 (comment) * [temp] no docker container for workflow telemetry to work * sudo in workflow for apt-get * use --dist worksteal in CI pytest * revert temp CI changes * minor fixes. accidentally removed container in CI lol * link to issue in script we do the weird thing because of CUDA issues. we should remove it at some point, and make sure that the CUDA worker catches it #201 (comment)
adapt `clustering/main` and the associated #198 to use the new `BaseConfig` introduced in #200 * switch BaseModel to BaseConfig, get rid of old save/read logic * fix typo * fix pydantic validation issue * use model_copy to avoid editing frozen dict when updating ClusteringRunConfig from CLI * remove deprecated config fields
well... somewhere in the range of 0.5 to 1s, but its the longest running test out of all the standard tests
* Have clustering be submitted via slurm jobs * Functional run_clustering.py script * Functional submitter (though workspace broken) * Refactors * Move pipeline_config.yaml to configs/ * Add non-used config entries to spd/clustering/configs/example.yaml * --n_runs --> --n-runs * Update run paths * Add calc_distances script * Fix run_pipeline wandb workspace * Have MergeConfig inherit from BaseConfig * Remove unused tensor_stats.py * Improve script docs and output file naming * Use existing random id generator which uses secrets * run_clustering -> main * Get tests passing (and remove some old tests) * Remove REFACTOR.md * Avoid loading run info twice * Add comment to test_run_clustering_happy_path * remove matplotlib related type ignore comments causing warns likely related to [f9a2a97](f9a2a97) / #210 * re-add --dist-worksteal in CI see https://github.com/goodfire-ai/spd/pull/203/files#r2425944229 * pdf_prefix -> figure_prefix (minimizing diff) * Add calc_distances to pipeline * Explicitly state distance plot file in pipeline * move ClusteringRunConfig to separate file run_clustering.py is very long, plus this helps us minimize the diff * [temp] rename {merge,clustering}_run_config for easier diff * re-add dataset_streaming option for fast CI tests * remove spd/clustering/utils it was empty * add `make clean` recipe * [wip] add back storage.py * wip * wip * make format * `Command` object continuing refactor, `make test` passing * try to fix some tests * working on fixing tests, some config issues, wip * ExecutionStamp reworking, it will manage output dirs * CI install fix -- python version used was 3.12, now we are on 3.13 * sigmoid typing fixes !!! unsure abt wht to use from CIOutputs * catch exception when checking git repo is clean * remove merge pair sampler GPU test * allow errors when calling repo_is_clean * wip debugging * fix?? * try/except workspace new view creation * fix create_clustering_workspace_view * just dont pass a wandb project to tests * config woes * wip * noneable slurm config * fix cli * wip * move `Command` to own class * storage refactor * wip * ensemble id keys must be unique if registered in db, pass None * snapshot creation, cli stuff * logging cleanup * Remove Command and use shlex * Remove Storage properties --------- Co-authored-by: Michael Ivanitskiy <mivanits@umich.edu>
add `matching_dist` and `matching_dist_vec` distance computations * wip * better debugging plots * wip * wip * wip * format * pyright fixes * wip * wip * [!!!] remove old testing code * integrate distances with the rest of the pipeline * wip * wip * wip * wip * dev recipe * fix pyright warning * wip * wip * wip * wip * make format * cli control for calc_distances `time` command doesnt work in CI or on andromeda * rework configs, make tests simpler and faster * fix config typo distances_method -> distances_methods
## Core changes - `ClusteringRunConfig.idx_in_ensemble` is removed entirely. index is now auto assigned - ~~pipeline config can contain merge run config either inline or as path~~ - ~~quite annoying when doing experiments to have to go and edit two files~~ - no longer a feature, but path will be validated on config load. see discussion in #227 (comment) - remove component popping - change brought over from #206 via commit [a1f1146](a1f1146) - clustering runs will now actually use the run id for the wandb run id - spd decomp runs don't yet ## Housekeeping - separate folders for pipeline configs and merge run configs - clearer to the user, and enables testing configs without trying to infer what type they are - more validators on the configs - some config tests, importantly validating all configs in the `spd/clustering/configs/` dir - often was previously the case that config schema or var names would change, old configs would go out of date, and this would not be immediately obvious - tests for ensemble registry ## Key questions: - [x] is this way of handling "incline config" vs "path to config" fine, or should I do it the more pydantic-y way? - only path to config, see #227 (comment) - [x] we can in principle now get rid of `idx_in_ensemble` entirely, since the database read/write should handle uniqueness of indexes. I have kept it in for now, but happy to get rid of it - got rid of it, see [40df505](40df505) - ~~when a config already exists at the expected path, should we compare strings or the loaded object? the former is probably faster since pydantic validations add overhead. also, hash collisions are vanishingly unlikely here~~ N/A, removed inline config functionality Commits: * allow specifying either config path or mrc cfg in pipeline cfg * [wip] reorg configs * added default `None` for slurm partition and job name prefix * refactor configs, add config tests * fix tests * allow `None` or `-1` idx_in_ensemble - idx_in_ensemble is None iff ensemble_id is None - idx_in_ensemble == -1 will make register_clustering_run() auto-assign next avalible index - added tests for ensemble registry * whoops, wrong name on fixture * fix idx passed in tests when not needed * rename "mrc" -> "crc" in paths I forgot its no longer called "MergeRunConfig" * rename merge_run_config.py -> clustering_run_config.py * fix pyright * fix idx_in_ensemble being passed in tests * rename cache dir 'merge_run_configs' -> 'clustering_run_configs' * remove component popping changes brought in from PR #206 branch clustering/refactor-multi-batch commit [9cbb52f](9cbb52f) * dont pass batch size, change not brought in here * fix history_path extension and storage usage * dev pipeline * better config validation tests * set default base output dir * wandb use run id for clustering, TODO for spd decomp * basedpyright 1.32.0 causes issues, esp w/ wandb https://github.com/goodfire-ai/spd/actions/runs/18719611602/job/53388090437 * remove idx_in_ensemble, always auto-assigned now see #227 (comment) * only allow passing clustering run config path, not inline see discussion at #227 (comment) have tried to make this change as isolated as possible -- i think this was a useful feature and we may want to add it back at some point * rename run_clustering_config_path -> clustering_run_config_path old name didnt make sense, since it should be a path to a file with a `ClusteringRunConfig` see #227 (comment)
|
Old description, for posterity: NOTE: Commit before I (Dan) started putting this in a state for merging to main. (migrated from #43) DescriptionThis PR implements clustering of subcomponents based on minimum description length (MDL) clustering. The primary interface to this package is via: This runs a pipeline which:
Related IssuesDepends on:TODOs:Code style
Variable naming
Implementation
Testing
MethodologyAdditions
Checks/Removals
|
NOTE: Commit before I (Dan) started putting this in a state for merging to main.
Description
Related Issues
#13
Does this PR cause a breaking change?
No