Skip to content

[TEST PR] Script for CI cosine similarity comparisons#277

Open
leesharkey wants to merge 44 commits intomainfrom
feature/ci-cosine-sim-merge-claude-mds
Open

[TEST PR] Script for CI cosine similarity comparisons#277
leesharkey wants to merge 44 commits intomainfrom
feature/ci-cosine-sim-merge-claude-mds

Conversation

@leesharkey
Copy link
Contributor

Description

Enhances the model comparison script with CI-based cosine similarity metrics for
more meaningful analysis of learned component alignment between SPD model runs.

Key Changes:

Metric Improvements:

  • Replaced activation density with mean causal importance (CI) as the component
    filtering criterion
  • Added CI cosine similarity metrics to measure component alignment between models
  • Renamed density_threshold → mean_ci_threshold with proper validation (0.0-1.0
    range)

Code Quality:

  • Refactored compute_activation_densities() → compute_ci_statistics() with better
    error handling
  • Added comprehensive shape mismatch checking with detailed warnings
  • Improved batch handling with StopIteration safeguards
  • Enhanced logging with component-level statistics

Configuration Updates:

  • Updated compare_models_config.yaml with new semantic parameter names
  • Adjusted default threshold value for CI-based filtering
  • Updated example model paths and batch size

Related Issue

N/A - Enhancement to post-hoc analysis tooling

Motivation and Context

The CI cosine similarity metrics provide better insight into how learned components
align between different model runs. Mean CI is a more meaningful measure of
component importance than activation density, as it directly quantifies each
component's causal contribution to model outputs.

This complements the existing geometric similarity metrics (which compare component
subspace geometry) with a functional similarity metric (which compares component
usage patterns on actual data).

How Has This Been Tested?

  • ✅ All formatting checks pass (make check)
  • ✅ All type checks pass (basedpyright, 0 errors)
  • ✅ All unit tests pass (200 passed, 11 skipped)
  • ✅ Code reviewed against CLAUDE_CHECKLIST.md standards
  • ✅ Removed obvious comment per style guide

Does this PR introduce a breaking change?

Minor breaking change in compare_models.py:

  • Config parameter renamed: density_threshold → mean_ci_threshold
  • Users with existing compare_models_config.yaml files will need to update this
    parameter name
  • Impact is minimal: The script is for post-hoc analysis only, not part of the core
    SPD training pipeline

leesharkey and others added 30 commits September 16, 2025 18:07
leesharkey and others added 14 commits October 28, 2025 14:00
Added two documentation files to help AI assistants work effectively with the SPD codebase:

- CLAUDE_COMPREHENSIVE.md: Complete reference guide covering development philosophy, coding standards, architecture patterns, workflows, and collaboration practices
- CLAUDE_CHECKLIST.md: Pre-submission checklist for verifying code changes meet SPD standards before committing

These documents ensure consistent code quality and help future AI assistants understand project conventions, reducing onboarding time and maintaining codebase consistency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added two checklist items to prevent future AI assistants from forgetting important steps:
- "Checked existing patterns" item to ensure new files follow existing conventions
- "Restarted checklist after any changes" with explicit STOP instruction to prevent incomplete verification

Also fixed references from "dev branch" to "main branch" throughout both documentation files, as the repository uses main as the primary development branch.

These changes address feedback from PR review process where these steps were accidentally omitted.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Per CLAUDE_CHECKLIST.md, removed redundant comment that was obvious from
the code itself. The line `alive_mask = mean_component_cis[layer_name] >
self.mean_ci_threshold` is self-explanatory.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants