Skip to content

v0.4.0

Compare
Choose a tag to compare
@github-actions github-actions released this 12 Apr 20:32
· 84 commits to main since this release

v0.4.0 (2024-04-12)

Chore

  • chore: modify coderabbit config to reduce verbosity (f13fc43)

  • chore: ignore png files (5c9ddc8)

Feature

  • feat: improve steering experiments utils (#147)

  • Add statsmodels

  • Add notebook ow/ results on choosing steerability metric

  • feat: add saving, loading for SVs

  • Finish initial study on aggregation method

  • rename

  • fix: use train_completion_template

  • Update lockfile

  • Remove system prompt for config for backwards-compatibility

  • feat: improve logging of missing steering configs

  • Update notebooks

  • chore: remove failing py311 ci run


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (f1487a9)

  • feat: add functions to compute logit statistics (#145)

  • Add functions to compute logit statistics

  • Make logit statistics optional


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (c447864)

Refactor

  • refactor: experiments (#141)

  • Add concept metrics calculation

  • fix: concept metrics

  • Add unit test for metrics

  • feat: layer-wise steering metrics

  • update config fields

  • update experiments code

  • minor

  • refactor: experiments code

  • refactor: experiments code

  • Test datasets exist before running

  • fix: database

  • add method to get config, fix delete_table

  • changes

  • more changes

  • Fix bug in experiment path

  • Add sweeps

  • WIP

  • Fix tests

  • Fix tests


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (ee5f190)

Unknown

  • updating persona generalization (#151)

  • updating persona generalization

  • temporarily disabling test due to cpu/cuda issue on ci (ef91f2f)

  • Add evaluate_generalization.py notebook (4a25847)

  • minor fixes (506ff11)

  • WIP: Persona cross steering (#150)

  • setting up cross-evaluation experiments

  • improving progress reporting in experiments

  • adding option to normalize steering magnitude to baseline

  • tweaking params

  • fixing nested progress

  • updating persona evals

  • passing eval params through persona experiment

  • setting up script for persona generalization experiments

  • more debugging output

  • updating test

  • fixing typing

  • make datasets as part of experiments script

  • fixing eval dataset selection

  • fixing eval

  • adding cross steering plots

  • shorten labels in cross-steering plots

  • WIP adding plotting helpers

  • refactoring plotting code

  • adding more plotting options

  • adding more content to plots

  • outptting more info in graphs (352df94)

  • Add sft training examples (bcf8c2c)

  • Experiments (#146)

  • Add sweeps

  • WIP experimental code

  • Update experiment notebook

  • Remove pycache


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (037bea3)

  • Add notebooks to run experiments (9c662c7)

  • Add fucntion to load sweep results (66cb2d7)