Skip to content

Update happy path tests to use default configs#290

Open
lee-goodfire wants to merge 46 commits intomainfrom
feature/default_tests
Open

Update happy path tests to use default configs#290
lee-goodfire wants to merge 46 commits intomainfrom
feature/default_tests

Conversation

@lee-goodfire
Copy link
Contributor

@lee-goodfire lee-goodfire commented Dec 5, 2025

Description

Updates all 4 happy path decomposition tests to load default configs from the experiment registry instead of manually defining them.

Related Issue

Closes #92

Motivation and Context

Tests were manually defining configs, which could diverge from default configs. This change ensures:

  1. Tests use the same configs as production
  2. Breaking changes to default configs will be caught by tests
  3. Single source of truth for experiment configurations
  4. Tests remain valid when default configs are updated

How Has This Been Tested?

Have run the tests. Tests run.

Does this PR introduce a breaking change?

No breaking changes. Tests maintain the same behavior, just load configs differently. The IH config fix ensures it works with current code (deprecated fields were already non-functional).

leesharkey and others added 30 commits September 16, 2025 18:07
leesharkey and others added 11 commits October 28, 2025 14:00
The test was loading resid_mlp2 config which has n_features=100
in the IdentityCIError metric, but creating a model with only 5 features.
This caused a validation error when the metric tried to verify the CI array shape.

Added test overrides to update the eval_metric_configs.IdentityCIError
to match the test model's n_features=5.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Load default configs from registry for all 4 happy path tests
- Override only test-specific parameters (C, steps, batch_size, etc.)
- Fix ResidMLP test metric config to match 5-feature test model
- Remove print statements and obvious comments per style guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@lee-goodfire lee-goodfire changed the title Update happy path tests to use default configs [TEST PR] Update happy path tests to use default configs Dec 5, 2025
leesharkey and others added 2 commits December 5, 2025 22:32
…tric_configs

The old loss coefficient fields (faithfulness_coeff, ci_recon_coeff, etc.)
were deprecated and removed during config validation, leaving an empty
loss_metric_configs list. This caused the total loss to be a plain tensor
with no gradient graph, resulting in a RuntimeError during backward pass.

Updated to use the new loss_metric_configs format with:
- ImportanceMinimalityLoss (coeff: 1e-2, pnorm: 0.1)
- CIMaskedReconLoss (coeff: 1.0)
- StochasticReconLoss (coeff: 1.0)
- StochasticReconLayerwiseLoss (coeff: 1.0)

This matches the original coefficient values and follows the pattern
used in other config files (tms_5-2_config.yaml, resid_mlp1_config.yaml).

Test now passes successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed multiple bugs in the GPT-2 test to work with the new registry-based config:
- Changed model class check from PreTrainedModel subclass to hasattr from_pretrained
- Added special handling for simple_stories_train models using from_run_info
- Fixed tokenizer path to use config.tokenizer_name instead of config.pretrained_model_name
- Fixed module patterns to match actual GPT2Simple structure (h.*.attn.q_proj instead of transformer.h.*.attn.c_attn)
- Disabled eval metrics that reference layers not in target_module_patterns

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
)

# The test passes if optimize runs without errors
print("TMS SPD optimization completed successfully")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed (along with others) due to "tests should pass silently" principle.

@lee-goodfire
Copy link
Contributor Author

Okay, I'm pretty happy with this one.

This was almost completely one shotted by the new claude plugin I've put together (which goodfire people can find here)

I've checked each line and to my eyes it seems good. This would pass my review as is, but I appreciate that e.g. Dan's eye will be better and identify more issues. But I'm happy to pass this off for further review.

@lee-goodfire
Copy link
Contributor Author

It's easiest to understand the pattern for tms and resid mlp2.

The other models, ih especially, are slightly messier, but I think it seems warranted

Copy link
Collaborator

@danbraunai-goodfire danbraunai-goodfire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tyty. Good to merge if I'm very unlikely to question any of the changes made to address the comments.

PR description comments

  • I think it's a long and not information dense enough. Could probably get away with just a few line description and one-line Motivation.
  • You should put "Closes #X" to close the issue automatically when the PR closes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we don't have a saved ih_transformer target model, unlike the other models. Ideally we'd make one if we want to support this experiment. I guess doing something like this test does where it makes a new randomly initialized model and runs SPD on that seems reasonable. I'd make a note that it's doing that though, even just a "TODO: Use a real pretrained_model_path in the config instead of randomly initializing one"

"eval_metric_configs": [], # Disable eval metrics to avoid layer matching issues
}
config_dict = apply_nested_updates(base_config, test_overrides)
config = Config.model_validate(config_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using Config(**config_dict) elsewhere in the codebase. They both do the same thing. I'm pretty indifferent but would prefer consistency here.

Comment on lines 26 to 27
"train_log_freq": 50,
"n_examples_until_dead": 200, # train_log_freq * batch_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd probably just put 999 or something, since this is never going to be used anyway. Don't think we did this previously but we should.

"eval_batch_size": 1,
"train_log_freq": 50,
"n_examples_until_dead": 200, # train_log_freq * batch_size
"task_config.max_seq_len": 16,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd go smaller cause why not

eval_data_split="test[100:200]",
),
)
base_config = get_experiment_config_file_contents("ss_gpt2_simple")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer either base_config_dict or base_config: dict[str, Any]. Otherwise people will confuse it for a Config object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the change this actually test ss_gpt2_config, which uses the transformers library gpt2 architecture and our custom model that was uploaded to huggingface.

Now it tests ss_gpt2_simple, which uses the simplestories gpt2_simple architecture and our custom model that was uploaded to wandb.

I think it's fine to keep this test, but I'd rename it to tests/test_ss_gpt2_simple.py.

But I do want to keep one test that has a pretrained_model_class from the transformers library, like the old one had. You may be able to test both configurations in this one test file with mark.parameterize or just a for loop. If doing that, the name of the file might be test_gpt2_configurations.py or something like that.

It would be nice to test the raw gpt too, i.e. the one in the gpt2_config.yaml. But you should test the runtime of that. Not worth it if it adds 5+ seconds to the slow tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed test_gpt2.py → test_gpt2_configs.py and added parametrized test covering both:

  • ss_gpt2_simple (simple_stories_train GPT2Simple, wandb-hosted)
  • ss_gpt2 (transformers.GPT2LMHeadModel from HuggingFace)

So we now test both the new config and a transformers library model like the old test had.

Re: raw gpt2 config - tested the runtime and it times out downloading the openwebtext dataset (8.8M examples), so didn't include it.

leesharkey and others added 3 commits December 18, 2025 16:10
- Rename base_config -> base_config_dict to avoid confusion with Config objects
- Use Config(**config_dict) instead of Config.model_validate() for consistency
- Change n_examples_until_dead to 999 (never used in tests anyway)
- Reduce max_seq_len in GPT2 test for faster execution
- Rename test_gpt2.py -> test_gpt2_configs.py
- Add parametrized test for both ss_gpt2_simple and ss_gpt2 configs
- Add TODO comment in test_ih_transformer.py about needing pretrained model

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@lee-goodfire
Copy link
Contributor Author

It's been a little while since you said

Good to merge if I'm very unlikely to question any of the changes made to address the comments.

So am requesting review again, but it's just in case. I think you're unlikely to question any of the changes made to address the comments.

@lee-goodfire lee-goodfire changed the title [TEST PR] Update happy path tests to use default configs Update happy path tests to use default configs Dec 18, 2025
@danbraunai-goodfire
Copy link
Collaborator

@lee-goodfire nice, lgtm. Could you comment on the open threads that you've completed them and resolve them? Typically as the author I'll comment "done" after I've implemented something, and leave for the reviewer to resolve them all. Not sure what the normal thing to do here is, but I guess something that doesn't require the author to manually check thing they commented on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make happy path tests use the default configs

3 participants