v0.4.0
v0.4.0 (2024-04-12)
Chore
Feature
-
feat: improve steering experiments utils (#147)
-
Add statsmodels
-
Add notebook ow/ results on choosing steerability metric
-
feat: add saving, loading for SVs
-
Finish initial study on aggregation method
-
rename
-
fix: use train_completion_template
-
Update lockfile
-
Remove system prompt for config for backwards-compatibility
-
feat: improve logging of missing steering configs
-
Update notebooks
-
chore: remove failing py311 ci run
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (f1487a9
)
-
feat: add functions to compute logit statistics (#145)
-
Add functions to compute logit statistics
-
Make logit statistics optional
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (c447864
)
Refactor
-
refactor: experiments (#141)
-
Add concept metrics calculation
-
fix: concept metrics
-
Add unit test for metrics
-
feat: layer-wise steering metrics
-
update config fields
-
update experiments code
-
minor
-
refactor: experiments code
-
refactor: experiments code
-
Test datasets exist before running
-
fix: database
-
add method to get config, fix delete_table
-
changes
-
more changes
-
Fix bug in experiment path
-
Add sweeps
-
WIP
-
Fix tests
-
Fix tests
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (ee5f190
)
Unknown
-
updating persona generalization (#151)
-
updating persona generalization
-
temporarily disabling test due to cpu/cuda issue on ci (
ef91f2f
) -
Add evaluate_generalization.py notebook (
4a25847
) -
minor fixes (
506ff11
) -
WIP: Persona cross steering (#150)
-
setting up cross-evaluation experiments
-
improving progress reporting in experiments
-
adding option to normalize steering magnitude to baseline
-
tweaking params
-
fixing nested progress
-
updating persona evals
-
passing eval params through persona experiment
-
setting up script for persona generalization experiments
-
more debugging output
-
updating test
-
fixing typing
-
make datasets as part of experiments script
-
fixing eval dataset selection
-
fixing eval
-
adding cross steering plots
-
shorten labels in cross-steering plots
-
WIP adding plotting helpers
-
refactoring plotting code
-
adding more plotting options
-
adding more content to plots
-
outptting more info in graphs (
352df94
) -
Add sft training examples (
bcf8c2c
) -
Experiments (#146)
-
Add sweeps
-
WIP experimental code
-
Update experiment notebook
-
Remove pycache
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (037bea3
)