-
Notifications
You must be signed in to change notification settings - Fork 9
Support module queries; handle NaN grads; refactor Attributor #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ac716af to
ceda77a
Compare
| "--model", type=str, default="HuggingFaceTB/SmolLM2-135M-Instruct" | ||
| ) | ||
| parser.add_argument("--dataset", type=str, default="EleutherAI/SmolLM2-135M-10B") | ||
| parser.add_argument("--dataset", type=str, default="RonenEldan/TinyStories") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smaller default dataset for casual consumers/prototyping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use epsilon instead of nan_to_num and then I think it's good to go
bergson/attributor.py
Outdated
| for name in q: | ||
| q[name] /= norm | ||
| # Zero gradients will be NaN after normalization | ||
| q[name] = q[name].nan_to_num(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I guess this works although the standard way to do this is to add epsilon, like 1e-8. It is a hyperparameter but it means the function isn't sharply discontinuous around zero
Cleanup/small fixes: