Releases: dtch1997/repepo
v0.8.1
v0.8.1 (2024-05-21)
Fix
-
fix: persona_generalization experiment script (#167)
-
update persona gen script with datasets
-
fix lint
-
adding a test to track missing persona datasets
-
adding all dataset prompts
-
adding test for get_all_prompts
-
adding qwen training script
-
fixing qwen script
-
tweaking qwen script
-
update qwen sweep script
-
adding llama layer sweep
-
fixing llama2 layers
-
fixing plot style
-
fixing formatting
-
adding plotting helper for steerability
-
fixing tests
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (2eb6b62
)
Unknown
-
update plots (
b914302
) -
update figures for paper (
ada05d8
) -
update figures for id steering (
35e11cc
) -
update paper figures (
c765cab
) -
add figures for correlating id and ood steering (
ffe1515
) -
Paper/preprocessing (#170)
-
add preprocessing script
-
update figures
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (307878f
)
-
updates to id results (
453737d
) -
In distribution results (#168)
-
add figures
-
add figures
-
wip: concept erasure
-
update
-
delete unused notebooks
-
update plots
-
concept erasure
-
fix lint
-
ignore type in random sv experiment
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0ca6521
)
v0.8.0
v0.8.0 (2024-05-07)
Feature
-
feat: more datasets (#164)
-
add all xrisk datasets
-
make truthfulqa consistent with others
-
fix some prompts
-
Verify all existing persona prompts
-
add persona prompts for Xrisk, sycophancy, tqa
-
refactor: make more datasets
-
minor
-
fix lint
-
fix test persona prompt len
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0f6dff2
)
v0.7.0
v0.6.0
v0.6.0 (2024-04-30)
Chore
-
chore: add note to explain cluster utils (
a13a503
) -
chore: add cluster utils (
dcd731a
) -
chore: modify path to include local dependencies (
4a97734
) -
chore: update time requirement in qsub (
d1268c8
) -
chore: delete incorrect test step (
1f4c2bb
) -
chore: update lockfile (
40baf38
) -
chore: update ci (#161)
-
update ci
-
chore: fix typo
-
chore: add py312 test
-
update ruff settings
-
chore: ruff format
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (4482d27
)
Feature
-
feat: cross steering results db (#162)
-
modify jobscript to submit array job
-
chore: fix typo
-
Add sqlite database for results
-
implement multiprocessing to make db
-
update db
-
add notebook for in-distribution-steerability
-
add analysis notebook for intra concept variability
-
fix ruff format
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (cd16daa
)
Unknown
v0.5.0
v0.5.0 (2024-04-24)
Chore
-
chore: replace hardcoded user id with daniel (
b1271a6
) -
chore: fix qsub script (
5f50c60
) -
chore: cleanup dead code (
9e615ff
)
Feature
-
feat: slim eval results and allowing multipler multipliers in cross-steering (#159)
-
adding slim eval results and allowing multipler multipliers in cross-steering
-
fixing linting
-
fixing typing (
64d94aa
) -
feat: add more persona prompts (#160)
-
add more persona prompts
-
add persona prompts, tests
-
Modify persona_generalization to work with more prompts
-
Add script to run persona generalization
-
fix: translation constants
-
Add test for variables
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (1cdf847
)
- feat: steerability metrics (
f9e22cc
)
Fix
- fix: replace global var (
d91a75f
)
Unknown
-
update test fixture (
53d06ab
) -
text: add dataset fixture (
1b10ef4
) -
set layer to 13 (
f75473d
) -
fixing linting (
cdb5494
) -
allow customizing metric name in persona plots (
d379a6d
) -
save persona generalization results individually (
de8bbd0
) -
Faster eval (#153)
-
hopefully improving eval speed
-
more device issues
-
reformatting (
03a6dfc
) -
tweaking cross steering (
2babe64
)
v0.4.0
v0.4.0 (2024-04-12)
Chore
Feature
-
feat: improve steering experiments utils (#147)
-
Add statsmodels
-
Add notebook ow/ results on choosing steerability metric
-
feat: add saving, loading for SVs
-
Finish initial study on aggregation method
-
rename
-
fix: use train_completion_template
-
Update lockfile
-
Remove system prompt for config for backwards-compatibility
-
feat: improve logging of missing steering configs
-
Update notebooks
-
chore: remove failing py311 ci run
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (f1487a9
)
-
feat: add functions to compute logit statistics (#145)
-
Add functions to compute logit statistics
-
Make logit statistics optional
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (c447864
)
Refactor
-
refactor: experiments (#141)
-
Add concept metrics calculation
-
fix: concept metrics
-
Add unit test for metrics
-
feat: layer-wise steering metrics
-
update config fields
-
update experiments code
-
minor
-
refactor: experiments code
-
refactor: experiments code
-
Test datasets exist before running
-
fix: database
-
add method to get config, fix delete_table
-
changes
-
more changes
-
Fix bug in experiment path
-
Add sweeps
-
WIP
-
Fix tests
-
Fix tests
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (ee5f190
)
Unknown
-
updating persona generalization (#151)
-
updating persona generalization
-
temporarily disabling test due to cpu/cuda issue on ci (
ef91f2f
) -
Add evaluate_generalization.py notebook (
4a25847
) -
minor fixes (
506ff11
) -
WIP: Persona cross steering (#150)
-
setting up cross-evaluation experiments
-
improving progress reporting in experiments
-
adding option to normalize steering magnitude to baseline
-
tweaking params
-
fixing nested progress
-
updating persona evals
-
passing eval params through persona experiment
-
setting up script for persona generalization experiments
-
more debugging output
-
updating test
-
fixing typing
-
make datasets as part of experiments script
-
fixing eval dataset selection
-
fixing eval
-
adding cross steering plots
-
shorten labels in cross-steering plots
-
WIP adding plotting helpers
-
refactoring plotting code
-
adding more plotting options
-
adding more content to plots
-
outptting more info in graphs (
352df94
) -
Add sft training examples (
bcf8c2c
) -
Experiments (#146)
-
Add sweeps
-
WIP experimental code
-
Update experiment notebook
-
Remove pycache
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (037bea3
)
v0.3.0
v0.3.0 (2024-03-07)
Feature
-
feat: steering experiments (#132)
-
Add experimental code
-
fix: make_country_capital script
-
feat: add code to run steering experiment
-
update experiments code
-
fix: add --config_path arg
-
fix: config yaml parsing
-
chore: add more configs
-
chore: add even more configs
-
refactor: plotting
-
feat: add script to run sweep
-
fix: do not set completion template by default
-
refactor sweeps
-
refactor: token concept sweep
-
fix: bugbears
-
chore: add comments
-
fix: steering_index of datasets
-
test: steering token index
-
updating steering_vectors library version
-
evaluate on more layers
-
refactor: use steering-vectors code, log instead of print
-
chore: fix docstring
-
test: training, evaluating steering vectors
-
fix: minor
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (8d1bd7d
)
v0.2.0
v0.1.0
v0.1.0 (2024-03-05)
Chore
- chore: remove pypi publishing from ci (
6ddd28d
)
Feature
-
feat: refactor types (#123)
-
Refactor types
-
Delete old experimental code
-
Refactor datasets
-
Improve dataset split parsing; update make_dataset types
-
Update preprocessed datasets save dir
-
refactor: delete duplicated files
-
Fix pyright errors
-
Fix tests
-
Fix lint
-
chore: re-add tqa raw datasets
-
feat: add tests for evaluators
-
feat: re-add caa-style prompt
-
chore: delete unused code
-
refactor: migrate SteeringHook to repepo.core
-
test: assert pos, neg prompts the same
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (e600270
)
v0.0.0
v0.0.0 (2024-03-04)
Chore
- chore: add semantic release to ci (
56be43f
)
Unknown
-
Delete dead code (
a1da6de
) -
translate the main persona datasets from @dtch1997's work (#108) (
b0b13a2
) -
Translation experiments (#99)
-
adding helpers for generating and parsing language from dataset filename
-
adding compare_dataset_translations experiment
-
adding experiment helpers
-
tweaking pirate translation strings
-
adding translations for non-persona mwes
-
fixing up make mwe helper
-
adding a 'ctx' pseudo-style
-
Revert "adding a 'ctx' pseudo-style"
This reverts commit a0058c4.
-
refactoring to allow using arbitrary dataset variations, insead of the hacky pseudo-language stuff
-
fixing using existing results in cross-steering
-
adding helpers to calculate jenson-shannon and KL for bernoulli distributions
-
using js dist for steering deltas
-
adding more tests (
46927b0
) -
adding translated persona MWE variants (#103)
-
adding translated persona MWE variants by pre-pending the generation ctx to each example
-
formatting translated_strings (
1a42e96
) -
adding google translate and re-translating persona datasets (#102)
-
adding google translate and re-translating persona datasets
-
fixing linting
-
removing unused test (
312b4ab
) -
standardizing dataset naming around language (#100) (
8fda2f9
) -
Generalization experiments (#96)
-
Add functions to do translation
-
Add TQA translate
-
Fix key name bug
-
WIP
-
Add script to generate TQA translated datasets
-
update expt name and dataset splits
-
Add Llama chat formatter
-
Minor fixes in caa_repro
-
Add options to print output, save steering vectors
-
Set default experiment path by train / test datasets
-
Add functionality to print examples
-
Add script to plot results
-
Add title to plotting code
-
Fix pdm lock
-
Add (very ugly) function to plot multiple results
Very ugly implementation but it works
-
Ignore png files
-
Enable translated system prompt
-
Add new experiments dir
-
Add notebook to analyze TQA vectors
-
Add script to download datasets
-
Add script to download datasets
-
WIP translate
-
Add code to extract and save steering vectors
-
Update experiments
-
Add more dataset names
-
Improve dataset inspection
-
Modify script to extract all SVs
-
Changes to notebooks
-
Update readme
-
WIP
-
Fix download datasets
-
Enable 4-bit loading
-
WIP
-
Visualize pairwise cos similarities
-
Inspect dataset s dataframe
-
Clustering results
-
Fix lint errors
-
Add script to extract concept vectors
-
WIP
-
Refactoring
-
Refactoring
-
Add script to run all experiments
-
Fix bug with results suffix
-
Uncomment some lines
-
Update README, bash script
-
Restore original experiments dir
-
Fix lint
-
Fix lint
-
Add more aggregations
-
Fix bug in download
-
Ignore html files
-
Add test for data preprocessing
-
Add tests for preprocessing
-
fixing black formatting issues
-
fixing typing
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (6680de7
)
-
Translate mwe and sycophancy (#97)
-
importing raw persona MWE datasets from anthropic
-
adding translation for mwe persona datasets and translating the first 5
-
translating sycophancy datasets
-
make_sycophancy_caa parses translations, and adding translations for misc strings
-
adding a convenience wrapper to load_translation
-
adding a script to make MWE personas datasets
-
fix lint formatting
-
alternating every 2 samples for MWE, not every 1 (
ce83a8d
) -
translating TQA into styles and languages with gpt4 (#94)
-
translating TQA into styles and languages with gpt4
-
dont force ascii, its not 1998
-
fixing test mocking
-
refactoring translations to make supporting more datasets easier (
d6d241a
) -
Caa tqa (#91)
-
refactoring formatting and benchmarking to support CAA
-
adding a basic test for get_normalized_correct_probs()
-
fixing tests
-
increasing sft loss threshold to make test less flaky
-
adding a TQA CAA dataset / experiment
Co-authored-by: David Chanin <chanindav@gmail.com> (97b1236
)
-
Refactoring formatting and benchmarking to support CAA (#87)
-
refactoring formatting and benchmarking to support CAA
-
adding a basic test for get_normalized_correct_probs()
-
fixing tests
-
increasing sft loss threshold to make test less flaky (
c98f067
) -
Merge pull request #88 from dtch1997/openai-translators
Add openai
as dependency, and translators notebook (c1fb281
)
- Add
openai
as dependency, and translators notebook
This notebook has a few simple functions for translating inputs and
dataframes using gpt-4. You will need an openai API key to run the code
(obviously). (dadf9f2
)
-
Caa experiments 2 (#80)
-
Add script to generate CAA datasets
-
Add correct CAA datasets
-
Add gitignore for experiments
-
Modify default template
-
Add get_normalized_correct_probs function
-
Add script to generate vectors
-
Add scripts to prompt w/ SV, plot results
-
Add notebook to compare our vs their CAA vectors
-
Add instructions to reproduce results
-
Add plots
-
Add evaluator for normalized correct probs
-
Skip failing tests
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (cca4b29
)
-
Fix failing test (
b7975b1
) -
refactoring prompting/formatting (#77)
-
refactoring prompting/formatting
-
fixing conflict in tests (
dca53ac
) -
Merge pull request #79 from dtch1997/swap-in-steering-vecs
swapping in steering-vectors lib (f86d0d2
)
- Merge pull request #78 from dtch1997/verify-caa-steering
adding a test to assert our steering is identical to CAA steering (a5ea301
)
-
swapping in steering-vectors lib (
8e86019
) -
adding a test to assert our steering is identical to CAA steering (
f26b2df
) -
Verify our code matches CAA (#76)
-
adding a llama chat formater and prompter based on CAA
-
testing that our reading vectors match CAA reading vectors
-
fixing linting
-
fixing test (
aa1dd24
) -
cleaning up oddities with steering vecs and repe algo (#72) (
773db50
) -
CAA tweaks / improvements (#70)
-
Add bitsandbytes, accelerate
-
Hardcode second-last token activation position for steering vectors
-
Add notebook diffmerge package for pretty git diffs
-
Add note on how to change RepE directions
-
Add note on how hooks work
-
Add options to decouple reading and control
-
fixing tests
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (1487639
)
-
CAA base (#69)
-
adding a record_activations() function to make it easy to collect model activations
-
replacing repe with our own CAA-esque implementation
-
only patch generated tokens
-
fix generating start index selection
-
fixing pyright error (
55980ea
) -
Add CAA datasets (#68)
-
Add CAA datasets
-
Update makefile
-
Add test for make_ab_prompt
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (c1ae7f1
)
-
Sft hf trainer (#50)
-
Working HF trainer script
-
customize wandb logging
-
Remove unused keys from SFTDataset
-
Add unit test for SFT
-
Fix import
-
Fix lint
-
Fix lint (again)
-
Fix test
-
Fix benchmark, pipeline logic
Update the train_and_evaluate fn to be consistent
with alg...