Skip to content

Releases: dtch1997/repepo

v0.8.1

21 May 04:59
Compare
Choose a tag to compare

v0.8.1 (2024-05-21)

Fix

  • fix: persona_generalization experiment script (#167)

  • update persona gen script with datasets

  • fix lint

  • adding a test to track missing persona datasets

  • adding all dataset prompts

  • adding test for get_all_prompts

  • adding qwen training script

  • fixing qwen script

  • tweaking qwen script

  • update qwen sweep script

  • adding llama layer sweep

  • fixing llama2 layers

  • fixing plot style

  • fixing formatting

  • adding plotting helper for steerability

  • fixing tests


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (2eb6b62)

Unknown

  • update plots (b914302)

  • update figures for paper (ada05d8)

  • update figures for id steering (35e11cc)

  • update paper figures (c765cab)

  • add figures for correlating id and ood steering (ffe1515)

  • Paper/preprocessing (#170)

  • add preprocessing script

  • update figures


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (307878f)

  • updates to id results (453737d)

  • In distribution results (#168)

  • add figures

  • add figures

  • wip: concept erasure

  • update

  • delete unused notebooks

  • update plots

  • concept erasure

  • fix lint

  • ignore type in random sv experiment


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0ca6521)

  • adding qwen formatting support and adding a sweep (#166)

  • adding qwen formatting support and adding a sweep

  • fixing formatting

  • saving progress during sweep (245022f)

  • delete unused notebooks (68b68e5)

  • add randomly sampled datasts (796bf90)

v0.8.0

07 May 11:13
Compare
Choose a tag to compare

v0.8.0 (2024-05-07)

Feature

  • feat: more datasets (#164)

  • add all xrisk datasets

  • make truthfulqa consistent with others

  • fix some prompts

  • Verify all existing persona prompts

  • add persona prompts for Xrisk, sycophancy, tqa

  • refactor: make more datasets

  • minor

  • fix lint

  • fix test persona prompt len


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0f6dff2)

v0.7.0

05 May 11:57
Compare
Choose a tag to compare

v0.7.0 (2024-05-05)

Feature

  • feat: pipeline generation (9c69185)

Unknown

v0.6.0

30 Apr 15:51
Compare
Choose a tag to compare

v0.6.0 (2024-04-30)

Chore

  • chore: add note to explain cluster utils (a13a503)

  • chore: add cluster utils (dcd731a)

  • chore: modify path to include local dependencies (4a97734)

  • chore: update time requirement in qsub (d1268c8)

  • chore: delete incorrect test step (1f4c2bb)

  • chore: update lockfile (40baf38)

  • chore: update ci (#161)

  • update ci

  • chore: fix typo

  • chore: add py312 test

  • update ruff settings

  • chore: ruff format


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (4482d27)

Feature

  • feat: cross steering results db (#162)

  • modify jobscript to submit array job

  • chore: fix typo

  • Add sqlite database for results

  • implement multiprocessing to make db

  • update db

  • add notebook for in-distribution-steerability

  • add analysis notebook for intra concept variability

  • fix ruff format


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (cd16daa)

Unknown

  • categorize persona prompts (6dd631f)

  • adding persona analysis notebook (38d3ed4)

v0.5.0

24 Apr 14:45
Compare
Choose a tag to compare

v0.5.0 (2024-04-24)

Chore

  • chore: replace hardcoded user id with daniel (b1271a6)

  • chore: fix qsub script (5f50c60)

  • chore: cleanup dead code (9e615ff)

Feature

  • feat: slim eval results and allowing multipler multipliers in cross-steering (#159)

  • adding slim eval results and allowing multipler multipliers in cross-steering

  • fixing linting

  • fixing typing (64d94aa)

  • feat: add more persona prompts (#160)

  • add more persona prompts

  • add persona prompts, tests

  • Modify persona_generalization to work with more prompts

  • Add script to run persona generalization

  • fix: translation constants

  • Add test for variables


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (1cdf847)

  • feat: steerability metrics (f9e22cc)

Fix

  • fix: replace global var (d91a75f)

Unknown

  • update test fixture (53d06ab)

  • text: add dataset fixture (1b10ef4)

  • set layer to 13 (f75473d)

  • fixing linting (cdb5494)

  • allow customizing metric name in persona plots (d379a6d)

  • save persona generalization results individually (de8bbd0)

  • Faster eval (#153)

  • hopefully improving eval speed

  • more device issues

  • reformatting (03a6dfc)

  • tweaking cross steering (2babe64)

v0.4.0

12 Apr 20:32
Compare
Choose a tag to compare

v0.4.0 (2024-04-12)

Chore

  • chore: modify coderabbit config to reduce verbosity (f13fc43)

  • chore: ignore png files (5c9ddc8)

Feature

  • feat: improve steering experiments utils (#147)

  • Add statsmodels

  • Add notebook ow/ results on choosing steerability metric

  • feat: add saving, loading for SVs

  • Finish initial study on aggregation method

  • rename

  • fix: use train_completion_template

  • Update lockfile

  • Remove system prompt for config for backwards-compatibility

  • feat: improve logging of missing steering configs

  • Update notebooks

  • chore: remove failing py311 ci run


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (f1487a9)

  • feat: add functions to compute logit statistics (#145)

  • Add functions to compute logit statistics

  • Make logit statistics optional


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (c447864)

Refactor

  • refactor: experiments (#141)

  • Add concept metrics calculation

  • fix: concept metrics

  • Add unit test for metrics

  • feat: layer-wise steering metrics

  • update config fields

  • update experiments code

  • minor

  • refactor: experiments code

  • refactor: experiments code

  • Test datasets exist before running

  • fix: database

  • add method to get config, fix delete_table

  • changes

  • more changes

  • Fix bug in experiment path

  • Add sweeps

  • WIP

  • Fix tests

  • Fix tests


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (ee5f190)

Unknown

  • updating persona generalization (#151)

  • updating persona generalization

  • temporarily disabling test due to cpu/cuda issue on ci (ef91f2f)

  • Add evaluate_generalization.py notebook (4a25847)

  • minor fixes (506ff11)

  • WIP: Persona cross steering (#150)

  • setting up cross-evaluation experiments

  • improving progress reporting in experiments

  • adding option to normalize steering magnitude to baseline

  • tweaking params

  • fixing nested progress

  • updating persona evals

  • passing eval params through persona experiment

  • setting up script for persona generalization experiments

  • more debugging output

  • updating test

  • fixing typing

  • make datasets as part of experiments script

  • fixing eval dataset selection

  • fixing eval

  • adding cross steering plots

  • shorten labels in cross-steering plots

  • WIP adding plotting helpers

  • refactoring plotting code

  • adding more plotting options

  • adding more content to plots

  • outptting more info in graphs (352df94)

  • Add sft training examples (bcf8c2c)

  • Experiments (#146)

  • Add sweeps

  • WIP experimental code

  • Update experiment notebook

  • Remove pycache


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (037bea3)

  • Add notebooks to run experiments (9c662c7)

  • Add fucntion to load sweep results (66cb2d7)

v0.3.0

07 Mar 12:12
Compare
Choose a tag to compare

v0.3.0 (2024-03-07)

Feature

  • feat: steering experiments (#132)

  • Add experimental code

  • fix: make_country_capital script

  • feat: add code to run steering experiment

  • update experiments code

  • fix: add --config_path arg

  • fix: config yaml parsing

  • chore: add more configs

  • chore: add even more configs

  • refactor: plotting

  • feat: add script to run sweep

  • fix: do not set completion template by default

  • refactor sweeps

  • refactor: token concept sweep

  • fix: bugbears

  • chore: add comments

  • fix: steering_index of datasets

  • test: steering token index

  • updating steering_vectors library version

  • evaluate on more layers

  • refactor: use steering-vectors code, log instead of print

  • chore: fix docstring

  • test: training, evaluating steering vectors

  • fix: minor


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (8d1bd7d)

v0.2.0

06 Mar 15:25
Compare
Choose a tag to compare

v0.2.0 (2024-03-06)

Feature

  • feat: add 'prompt' string for bats dataset. (83688c1)

Fix

  • fix: make_bats function (d789eb7)

Unknown

  • Add BATS dataset (#125)

  • Add BATS dataset

  • test: add test for make_bats


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (dcb7ccd)

v0.1.0

05 Mar 13:42
Compare
Choose a tag to compare

v0.1.0 (2024-03-05)

Chore

  • chore: remove pypi publishing from ci (6ddd28d)

Feature

  • feat: refactor types (#123)

  • Refactor types

  • Delete old experimental code

  • Refactor datasets

  • Improve dataset split parsing; update make_dataset types

  • Update preprocessed datasets save dir

  • refactor: delete duplicated files

  • Fix pyright errors

  • Fix tests

  • Fix lint

  • chore: re-add tqa raw datasets

  • feat: add tests for evaluators

  • feat: re-add caa-style prompt

  • chore: delete unused code

  • refactor: migrate SteeringHook to repepo.core

  • test: assert pos, neg prompts the same


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (e600270)

v0.0.0

04 Mar 17:32
Compare
Choose a tag to compare

v0.0.0 (2024-03-04)

Chore

  • chore: add semantic release to ci (56be43f)

Unknown

  • Delete dead code (a1da6de)

  • translate the main persona datasets from @dtch1997's work (#108) (b0b13a2)

  • Translation experiments (#99)

  • adding helpers for generating and parsing language from dataset filename

  • adding compare_dataset_translations experiment

  • adding experiment helpers

  • tweaking pirate translation strings

  • adding translations for non-persona mwes

  • fixing up make mwe helper

  • adding a 'ctx' pseudo-style

  • Revert "adding a 'ctx' pseudo-style"

This reverts commit a0058c4.

  • refactoring to allow using arbitrary dataset variations, insead of the hacky pseudo-language stuff

  • fixing using existing results in cross-steering

  • adding helpers to calculate jenson-shannon and KL for bernoulli distributions

  • using js dist for steering deltas

  • adding more tests (46927b0)

  • adding translated persona MWE variants (#103)

  • adding translated persona MWE variants by pre-pending the generation ctx to each example

  • formatting translated_strings (1a42e96)

  • adding google translate and re-translating persona datasets (#102)

  • adding google translate and re-translating persona datasets

  • fixing linting

  • removing unused test (312b4ab)

  • standardizing dataset naming around language (#100) (8fda2f9)

  • Generalization experiments (#96)

  • Add functions to do translation

  • Add TQA translate

  • Fix key name bug

  • WIP

  • Add script to generate TQA translated datasets

  • update expt name and dataset splits

  • Add Llama chat formatter

  • Minor fixes in caa_repro

  • Add options to print output, save steering vectors

  • Set default experiment path by train / test datasets

  • Add functionality to print examples

  • Add script to plot results

  • Add title to plotting code

  • Fix pdm lock

  • Add (very ugly) function to plot multiple results

Very ugly implementation but it works

  • Ignore png files

  • Enable translated system prompt

  • Add new experiments dir

  • Add notebook to analyze TQA vectors

  • Add script to download datasets

  • Add script to download datasets

  • WIP translate

  • Add code to extract and save steering vectors

  • Update experiments

  • Add more dataset names

  • Improve dataset inspection

  • Modify script to extract all SVs

  • Changes to notebooks

  • Update readme

  • WIP

  • Fix download datasets

  • Enable 4-bit loading

  • WIP

  • Visualize pairwise cos similarities

  • Inspect dataset s dataframe

  • Clustering results

  • Fix lint errors

  • Add script to extract concept vectors

  • WIP

  • Refactoring

  • Refactoring

  • Add script to run all experiments

  • Fix bug with results suffix

  • Uncomment some lines

  • Update README, bash script

  • Restore original experiments dir

  • Fix lint

  • Fix lint

  • Add more aggregations

  • Fix bug in download

  • Ignore html files

  • Add test for data preprocessing

  • Add tests for preprocessing

  • fixing black formatting issues

  • fixing typing


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (6680de7)

  • Translate mwe and sycophancy (#97)

  • importing raw persona MWE datasets from anthropic

  • adding translation for mwe persona datasets and translating the first 5

  • translating sycophancy datasets

  • make_sycophancy_caa parses translations, and adding translations for misc strings

  • adding a convenience wrapper to load_translation

  • adding a script to make MWE personas datasets

  • fix lint formatting

  • alternating every 2 samples for MWE, not every 1 (ce83a8d)

  • translating TQA into styles and languages with gpt4 (#94)

  • translating TQA into styles and languages with gpt4

  • dont force ascii, its not 1998

  • fixing test mocking

  • refactoring translations to make supporting more datasets easier (d6d241a)

  • Caa tqa (#91)

  • refactoring formatting and benchmarking to support CAA

  • adding a basic test for get_normalized_correct_probs()

  • fixing tests

  • increasing sft loss threshold to make test less flaky

  • adding a TQA CAA dataset / experiment


Co-authored-by: David Chanin <chanindav@gmail.com> (97b1236)

  • Refactoring formatting and benchmarking to support CAA (#87)

  • refactoring formatting and benchmarking to support CAA

  • adding a basic test for get_normalized_correct_probs()

  • fixing tests

  • increasing sft loss threshold to make test less flaky (c98f067)

  • Merge pull request #88 from dtch1997/openai-translators

Add openai as dependency, and translators notebook (c1fb281)

  • Add openai as dependency, and translators notebook

This notebook has a few simple functions for translating inputs and
dataframes using gpt-4. You will need an openai API key to run the code
(obviously). (dadf9f2)

  • fixing tests after CAA merge (#85) (6efbfaf)

  • Caa experiments 2 (#80)

  • Add script to generate CAA datasets

  • Add correct CAA datasets

  • Add gitignore for experiments

  • Modify default template

  • Add get_normalized_correct_probs function

  • Add script to generate vectors

  • Add scripts to prompt w/ SV, plot results

  • Add notebook to compare our vs their CAA vectors

  • Add instructions to reproduce results

  • Add plots

  • Add evaluator for normalized correct probs

  • Skip failing tests


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (cca4b29)

  • Fix failing test (b7975b1)

  • refactoring prompting/formatting (#77)

  • refactoring prompting/formatting

  • fixing conflict in tests (dca53ac)

  • Merge pull request #79 from dtch1997/swap-in-steering-vecs

swapping in steering-vectors lib (f86d0d2)

  • Merge pull request #78 from dtch1997/verify-caa-steering

adding a test to assert our steering is identical to CAA steering (a5ea301)

  • swapping in steering-vectors lib (8e86019)

  • adding a test to assert our steering is identical to CAA steering (f26b2df)

  • Verify our code matches CAA (#76)

  • adding a llama chat formater and prompter based on CAA

  • testing that our reading vectors match CAA reading vectors

  • fixing linting

  • fixing test (aa1dd24)

  • cleaning up oddities with steering vecs and repe algo (#72) (773db50)

  • CAA tweaks / improvements (#70)

  • Add bitsandbytes, accelerate

  • Hardcode second-last token activation position for steering vectors

  • Add notebook diffmerge package for pretty git diffs

  • Add note on how to change RepE directions

  • Add note on how hooks work

  • Add options to decouple reading and control

  • fixing tests


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (1487639)

  • CAA base (#69)

  • adding a record_activations() function to make it easy to collect model activations

  • replacing repe with our own CAA-esque implementation

  • only patch generated tokens

  • fix generating start index selection

  • fixing pyright error (55980ea)

  • Add CAA datasets (#68)

  • Add CAA datasets

  • Update makefile

  • Add test for make_ab_prompt


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (c1ae7f1)

  • Sft hf trainer (#50)

  • Working HF trainer script

  • customize wandb logging

  • Remove unused keys from SFTDataset

  • Add unit test for SFT

  • Fix import

  • Fix lint

  • Fix lint (again)

  • Fix test

  • Fix benchmark, pipeline logic

Update the train_and_evaluate fn to be consistent
with alg...

Read more