Skip to content

Conversation

@luciaquirke
Copy link
Collaborator

@luciaquirke luciaquirke commented Sep 12, 2025

Cleanup/small fixes:

  • Perform sanity check for NaNs in index grads (shouldn't happen)
  • Resolve post unit norm NaNs in Attributor (sometimes happens when the grads have been converted to zero vectors by the final dtype conversion before saving to disk)
  • Support queries where k > n
  • Support querying module/s in addition to full model grads
  • Use a less performant default FAISS config that works on all devices. The recommended one is documented in the docstring.
  • Move FAISS logic into its own file to simplify Attributor
  • Add FAISS CLI flag to query_index
  • Deprecate unstructured gradient indices
  • Harden check for existing FAISS index
  • Remove redundant unit norm
  • Rename chunk to shard

@luciaquirke luciaquirke changed the title Extract out FAISS index from Attributor Refactor Attributor; support module queries; handle NaNs Sep 12, 2025
@luciaquirke luciaquirke changed the title Refactor Attributor; support module queries; handle NaNs Support module queries; handle NaNs; refactor Attributor Sep 12, 2025
@luciaquirke luciaquirke changed the title Support module queries; handle NaNs; refactor Attributor Support module queries; handle NaN grads; refactor Attributor Sep 12, 2025
"--model", type=str, default="HuggingFaceTB/SmolLM2-135M-Instruct"
)
parser.add_argument("--dataset", type=str, default="EleutherAI/SmolLM2-135M-10B")
parser.add_argument("--dataset", type=str, default="RonenEldan/TinyStories")
Copy link
Collaborator Author

@luciaquirke luciaquirke Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smaller default dataset for casual consumers/prototyping

Copy link
Member

@norabelrose norabelrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use epsilon instead of nan_to_num and then I think it's good to go

for name in q:
q[name] /= norm
# Zero gradients will be NaN after normalization
q[name] = q[name].nan_to_num(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I guess this works although the standard way to do this is to add epsilon, like 1e-8. It is a hyperparameter but it means the function isn't sharply discontinuous around zero

@luciaquirke luciaquirke merged commit 22c0553 into main Sep 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants