v0.4.0

github-actions released this 12 Apr 20:32

· 84 commits to main since this release

8a6be67

v0.4.0 (2024-04-12)

Chore

chore: modify coderabbit config to reduce verbosity (f13fc43)
chore: ignore png files (5c9ddc8)

Feature

feat: improve steering experiments utils (#147)
Add statsmodels
Add notebook ow/ results on choosing steerability metric
feat: add saving, loading for SVs
Finish initial study on aggregation method
rename
fix: use train_completion_template
Update lockfile
Remove system prompt for config for backwards-compatibility
feat: improve logging of missing steering configs
Update notebooks
chore: remove failing py311 ci run

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (f1487a9)

feat: add functions to compute logit statistics (#145)
Add functions to compute logit statistics
Make logit statistics optional

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (c447864)

Refactor

refactor: experiments (#141)
Add concept metrics calculation
fix: concept metrics
Add unit test for metrics
feat: layer-wise steering metrics
update config fields
update experiments code
minor
refactor: experiments code
refactor: experiments code
Test datasets exist before running
fix: database
add method to get config, fix delete_table
changes
more changes
Fix bug in experiment path
Add sweeps
WIP
Fix tests
Fix tests

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (ee5f190)

Unknown

updating persona generalization (#151)
updating persona generalization
temporarily disabling test due to cpu/cuda issue on ci (ef91f2f)
Add evaluate_generalization.py notebook (4a25847)
minor fixes (506ff11)
WIP: Persona cross steering (#150)
setting up cross-evaluation experiments
improving progress reporting in experiments
adding option to normalize steering magnitude to baseline
tweaking params
fixing nested progress
updating persona evals
passing eval params through persona experiment
setting up script for persona generalization experiments
more debugging output
updating test
fixing typing
make datasets as part of experiments script
fixing eval dataset selection
fixing eval
adding cross steering plots
shorten labels in cross-steering plots
WIP adding plotting helpers
refactoring plotting code
adding more plotting options
adding more content to plots
outptting more info in graphs (352df94)
Add sft training examples (bcf8c2c)
Experiments (#146)
Add sweeps
WIP experimental code
Update experiment notebook
Remove pycache

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (037bea3)

Add notebooks to run experiments (9c662c7)
Add fucntion to load sweep results (66cb2d7)

Assets 2