[TEST PR] Adding PGD CE Diff metric by leesharkey · Pull Request #276 · goodfire-ai/spd

leesharkey · 2025-11-25T04:52:29Z

This PR adds a new metric PGDCEDiff that computes the cross-entropy (CE) difference between adversarially-optimized PGD-masked model outputs and
the target model.

Changes:

New file: spd/metrics/pgd_ce_diff.py - Implements the PGDCEDiff metric class
Updated: spd/configs.py - Adds PGDCEDiffConfig and registers it in EvalOnlyMetricConfigType
Updated: spd/eval.py - Adds metric initialization logic and imports
Updated: spd/metrics/__init__.py - Exports the new metric

Key Features:

Uses Projected Gradient Descent (PGD) to find adversarial masks that maximize CE loss against true labels
Returns metric key ce_difference_pgd_masked representing: CE(pgd_masked_output) - CE(target_output)
Follows the same architecture as existing PGD metrics (PGDReconLoss, PGDReconLayerwiseLoss)
Complements the CE difference metrics in CEandKLLosses with PGD-optimized masking
Gracefully handles non-LM tasks by detecting output dimensionality (only runs for 3D outputs)

Configuration:

Add to experiment configs with standard PGD parameters:

eval_metric_configs:
  - classname: "PGDCEDiff"
    init: "random"
    step_size: 0.01
    n_steps: 20
    mask_scope: "shared_across_batch"

Related Issue

N/A - New feature implementation

Motivation and Context

During SPD training, we track various metrics to understand how different masking strategies affect model performance. We already have:

CE difference metrics for various masking strategies (CI-masked, stochastic, random, etc.) in CEandKLLosses
PGD reconstruction loss metrics using MSE and KL divergence

This PR fills the gap by adding a PGD-optimized CE difference metric, allowing us to measure how adversarially-chosen masks (optimized to maximize
CE loss) compare to the target model. This is useful for:

Understanding worst-case masking scenarios
Evaluating the robustness of learned component decompositions
Measuring maximum possible CE degradation under adversarial component selection

How Has This Been Tested?

✅ Type checking passes (make type - basedpyright)
✅ Linting passes (make format - ruff)
✅ Pre-commit hooks pass
✅ Code follows existing patterns from PGDReconLoss and CEandKLLosses
✅ All assertions and error handling in place for fail-fast behavior
✅ Tested on ResidualMLP experiment: Metric gracefully skips (returns 0) for non-LM tasks
✅ Tested on SimpleStories LM experiment: Metric computes successfully with meaningful values (~20.88 CE difference)

Test Results:

ResidualMLP: eval/ce_kl/ce_difference_pgd_masked: nan (correctly skipped non-3D output)
SimpleStories LM: eval/ce_kl/ce_difference_pgd_masked: 20.881893157958984 (working correctly!)

Note on unit tests: Following the repository's testing philosophy for research code ("Integration tests often too much overhead for research
code. Interactive use catches issues at low cost"), no unit tests were added. This is consistent with other PGD metrics in the codebase which also
lack unit tests. The metric has been validated during actual experiment runs.

Does this PR introduce a breaking change?

No breaking changes. This is purely additive - a new metric that can be optionally configured in experiment YAML files. Existing experiments
continue to work without any modifications."

…tested

… than eval

Added two documentation files to help AI assistants work effectively with the SPD codebase: - CLAUDE_COMPREHENSIVE.md: Complete reference guide covering development philosophy, coding standards, architecture patterns, workflows, and collaboration practices - CLAUDE_CHECKLIST.md: Pre-submission checklist for verifying code changes meet SPD standards before committing These documents ensure consistent code quality and help future AI assistants understand project conventions, reducing onboarding time and maintaining codebase consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added two checklist items to prevent future AI assistants from forgetting important steps: - "Checked existing patterns" item to ensure new files follow existing conventions - "Restarted checklist after any changes" with explicit STOP instruction to prevent incomplete verification Also fixed references from "dev branch" to "main branch" throughout both documentation files, as the repository uses main as the primary development branch. These changes address feedback from PR review process where these steps were accidentally omitted. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implements a new metric that computes the cross-entropy difference between adversarially-optimized PGD-masked model outputs and the target model. The metric uses Projected Gradient Descent (PGD) to find adversarial masks that maximize CE loss against true labels, then reports the CE difference from the target model. This complements existing PGD metrics (MSE, KL) and follows the same pattern as CEandKLLosses for computing CE differences. The metric is exported in spd/metrics/__init__.py for use in experiments.

leesharkey · 2025-11-25T04:57:50Z

@claude Please review this PR (specifically the spd/metrics/pgd_ce_diff.py and surrounding changes -- Ignore the CLAUDE_ files. Pretend they're not there. Once you're done and have come to a conclusion, cross check your review against the info in the CLAUDE_COMPREHENSIVE.md and CLAUDE_CHECKLIST.md files.

claude · 2025-11-25T04:58:06Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

- Add PGDCEDiffConfig to configs.py - Add to EvalOnlyMetricConfigType union - Create test config with PGD CE diff metric and reduced steps

- Import PGDCEDiff and PGDCEDiffConfig - Add case to init_metric for PGDCEDiff instantiation

Add check for 3D output shape (batch, seq_len, vocab) and return zero if not applicable. This allows the metric to be included in configs but skip computation for non-LM tasks like ResidualMLP.

danbraunai

Misc comments:

This PR has CLAUDE_CHECKLIST.md and CLAUDE_COMPREHENSIVE.md files in it, we'd want to remove those if merging
This PR implements the full-layer version of the loss. We'd want to confirm that we only want this and not also want the subset and layerwise versions of it.
Not sure if we actually want this as a Metric that we can use to train on. If we don't want it as a metric, we can probably make it simpler. We may even want to put this the existing ce_and_kl_losses.py file, although I'd make sure that this won't massively slow down that calculation. If it did slow it down, then having it as a separate file that is optional to run makes sense (I suppose maybe we'd want it as a proper "metric" if doing that). Also be mindful if going the ce_and_kl_losses.py route that doesn't reduce over ranks, which is problematic for shared_over_batch.
I think we can make minor modifications to the existing pgd functions in pgd_utils.py so that they can return CE or KL. Then we could just call those functions instead of rewriting a lot of the stuff here.

So I think we'll need to chat with Lucius who suggested this feature about some more concrete specs.

leesharkey and others added 30 commits September 16, 2025 18:07

Geometric similarity comparison made consistent with other evals and …

b93b9d6

…tested

Replaced mean max cosine sim with mean max ABS cosine sim

cd5fda2

Configs for geom comparison runs

61d3408

Merge remote-tracking branch 'origin/main' into feature/geom_sim_compar

63c85f0

Minor modifications to make PR-ready

770a5c5

Merge remote-tracking branch 'origin/main' into feature/geom_sim_compar

49ba925

Update seed to be consistent with other configs again

364198e

Cleaned up some comments and other bits

57c2c76

Major update of PR following review: Now implemented as script rather…

2e7752d

… than eval

Merge remote-tracking branch 'origin/main' into feature/geom_sim_compar

4fbf807

Updated registry to delete old obselete experiments

98a6620

Merge branch 'main' into feature/geom_sim_compar

bede346

Merge branch 'main' into feature/geom_sim_compar

acc04f1

Reorganized compare_models into subdirectory and cleaned up config code

62bd77e

Merging

b84814a

Updated README.md

5173a6a

Added some example models to the config

181cac8

Getting rid of newline

8db7559

Minor changes to make the PR mergeable

0d05f0a

Merge branch 'main' of https://github.com/goodfire-ai/spd

8767194

Merge branch 'main' of https://github.com/goodfire-ai/spd

019eb2d

Merge branch 'main' of https://github.com/goodfire-ai/spd

b935b4c

Merge branch 'main' of github.com:goodfire-ai/spd

3d1edeb

Merge branch 'main' of github.com:goodfire-ai/spd

1dd738d

Merge branch 'main' of github.com:goodfire-ai/spd

956f3d4

Merge branch 'main' of github.com:goodfire-ai/spd

f7ad411

Merge branch 'main' of github.com:goodfire-ai/spd

ade1377

Merge branch 'main' of github.com:goodfire-ai/spd

08875a9

Merge branch 'main' of github.com:goodfire-ai/spd

7ca7037

Merge branch 'main' of github.com:goodfire-ai/spd

cbbdb61

leesharkey and others added 10 commits October 28, 2025 14:00

Merge branch 'main' of github.com:goodfire-ai/spd

267deb6

Merge branch 'main' of github.com:goodfire-ai/spd

f49e9e0

Merge branch 'main' of github.com:goodfire-ai/spd

22f7cfc

Merge branch 'main' of github.com:goodfire-ai/spd

ab5346d

Merge branch 'main' of github.com:goodfire-ai/spd

7cb528f

Merge branch 'main' of github.com:goodfire-ai/spd

01d1b6b

Merge branch 'main' of github.com:goodfire-ai/spd

a78fdc5

leesharkey added 3 commits November 25, 2025 05:29

Add PGDCEDiffConfig and test config for resid_mlp1

2bd15e7

- Add PGDCEDiffConfig to configs.py - Add to EvalOnlyMetricConfigType union - Create test config with PGD CE diff metric and reduced steps

Add PGDCEDiff metric initialization to eval.py

6aaf18c

- Import PGDCEDiff and PGDCEDiffConfig - Add case to init_metric for PGDCEDiff instantiation

Handle non-LM tasks gracefully in PGDCEDiff

30d59fc

Add check for 3D output shape (batch, seq_len, vocab) and return zero if not applicable. This allows the metric to be included in configs but skip computation for non-LM tasks like ResidualMLP.

leesharkey force-pushed the feature/pgd_ce_diff branch from df33975 to 30d59fc Compare November 25, 2025 05:30

leesharkey requested review from Laplace418 and danbraunai November 25, 2025 05:34

danbraunai suggested changes Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST PR] Adding PGD CE Diff metric#276

[TEST PR] Adding PGD CE Diff metric#276
leesharkey wants to merge 43 commits intomainfrom
feature/pgd_ce_diff

leesharkey commented Nov 25, 2025 •

edited

Loading

Uh oh!

leesharkey commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025

Uh oh!

danbraunai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leesharkey commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes:

Key Features:

Configuration:

Related Issue

Motivation and Context

How Has This Been Tested?

Test Results:

Does this PR introduce a breaking change?

Uh oh!

leesharkey commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025

Uh oh!

danbraunai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leesharkey commented Nov 25, 2025 •

edited

Loading